Gene cloning

ABSTRACT

There is provided a method of targeted cloning of genes and gene clusters by directly isolating the DNA of interest from a mixed population, thereby permitting the construction of a very targeted, highly enriched library. Also provided are several unique methods for cloning the genes provided by this method and the probes used in connection with this method.

BACKGROUND OF THE INVENTION

[0001] 1. Field of the Invention

[0002] This invention relates generally to methods and materials for use in gene cloning. More specifically, the present invention relates to gene probes/primers for use in discovery and characterization of bioactive compound coding genes and gene clusters.

[0003] 2. Description of Related Art

[0004] The basic challenges in drug discovery are to identify a lead compound with desirable activity, and to optimize the lead compound to meet criteria required to proceed with further drug development. One common approach to drug discovery involves presenting macromolecules implicated in causing a disease (disease targets) in bioassays in which potential drug candidates are tested for therapeutic activity. Such molecules could be receptors, enzymes or transcription factors.

[0005] Another approach involves presenting whole cells or organisms that are representative of the causative agent of the disease. Such agents include bacteria and tumor cell lines.

[0006] Traditionally, there are two sources of potential drug candidates: collections of natural products and synthetic chemicals. Identification of lead compounds has been achieved by random screening of such collections which encompass as broad a range of structural types as possible. The recent development of synthetic combinatorial chemical libraries will further increase the number and variety of compounds available for screening. However, the diversity in any synthetic chemical library is limited to human imagination and skills of synthesis.

[0007] Random screening of natural products from sources such as terrestrial bacteria, fungi, invertebrates and plants has resulted in the discovery of many important drugs (Franco et al. 1991, Critical Rev Biotechnol 11:193-276; Goodfellow et al. 1989, in “Microbial Products: New Approaches”, Cambridge University Press, pp. 343-383; Berdy 1974, Adv Appl Microbiol 18:309-406; Suffness et al. 1988, in Biomedical Importance of Marine Organisms, D. G. Fautin, California Academy of Sciences, pages 151-157). More than 10,000 of these natural products are biologically active and at least 100 of these are currently in use spanning the entire therapeutic spectrum, including antibiotics, anti-cancer agents, and cardiovascular agents, and also as agrochemicals. The success of this approach of drug discovery depends heavily on how many compounds enter a screening program and how efficiently the screening can be conducted. Thus, indication-specific compound libraries have tremendous advantages to this end.

[0008] Typically, pharmaceutical companies screen compound collections containing hundreds of thousands of natural and synthetic compounds. However, the ratio of novel to previously-discovered compounds has diminished with time. In screens for anti-cancer agents, for example, most of the microbial species which are biologically active can yield compounds that are already characterized. This is due partly to the difficulties of consistently and adequately finding, reproducing and supplying novel natural product samples. Since biological diversity is largely due to underlying molecular diversity, there is insufficient biological diversity in the organisms currently selected for random screening, which reduces the probability that novel compounds will be isolated.

[0009] Novel bioactivity has consistently been found in various natural sources. See for example, Cragg et al., 1994. (in “Enthnobotany and the search for new drugs” Wiley, Chichester. p178-196). Few of these sources have been explored systematically and thoroughly for novel drug leads. For example, it has been estimated that only 5000 plant species have been studied exhaustively for possible medical use. This is a minor fraction of the estimated total of 250,000-3,000,000 species, most of which grow in the tropics (Abelson 1990, Science 247:513). Moreover, out of the estimated millions of species of marine microorganisms, only a small number have been characterized. Indeed, there is tremendous biodiversity that remains untapped as sources of lead compounds. Conventional methods of compound discovery from these sources is requisite on the successful laboratory culture of the microbial flora, a practice that is only approximately 1% efficient. Thus, the vast majority of environmental microorganisms cannot be grown in a laboratory and therefore, any potential bioactive compounds that they produce cannot be assayed.

[0010] Terrestrial microorganisms, fungi, invertebrates and plants have historically been used as sources of natural products. However, apart from several well-studied groups of organisms, such as the actinomycetes, which have been developed for drug screening and commercial production, reproducibility and production problems still exist. For example, the antitumor agent, TAXOL™, is a constituent of the bark of mature Pacific yew trees, and its supply as a clinical agent has caused concern about damage to the local ecological system. Taxol contains 11 chiral centers with 2048 possible diastereoisomeric forms so that its de novo synthesis on a commercial scale seems unlikely (Phillipson, 1994, Trans Royal Soc Trop Med Hyg 88 Supp 1:17-19).

[0011] Marine invertebrates are a promising source of novel compounds but there exist major weaknesses in the technology for conducting drug screens and large-scale resupply. For instance, marine invertebrates can be difficult to recollect, and many have seasonal variability in natural product content.

[0012] Marine microorganisms are a promising source of novel compounds but there also exist major weaknesses in the technology for conducting drug screens and industrial fermentation with marine microorganisms. For instance, marine microorganisms are difficult to collect, establish and maintain in culture, and many have specialized nutrient requirements. A reliable source of unpolluted seawater is generally essential for fermentation. It is estimated that at least 99% of marine bacteria species do not survive on laboratory media. Furthermore, available commercial fermentation equipment is not optimal for use in saline conditions, or under high pressure.

[0013] Certain compounds appear in nature only when specific organisms interact with each other and the environment. Pathogens can alter plant gene expression and trigger synthesis of compounds, such as phytoalexins, that enable the plant to resist attack. For example, the wild tobacco plant Nicotiana sylvestris increases its synthesis of alkaloids when under attack from larvae of Manduca sexta. Likewise fungi can respond to phytoalexins by detoxification or preventing their accumulation. Such metabolites will be missed by traditional high-throughput screens, which do not evaluate a fungus together with its plant host. A dramatic example of the influence of the natural environment on an organism is seen with the poison dart frog. While a lethal dose of the sodium channel agonist alkaloid, batrachotoxin, can be harvested by rubbing the tip of a blow dart across the glandular back of a field specimen, batrachotoxin could not be detected in second generation terrarium-reared frogs (Daly, 1995, Proc. Natl. Acad. Sci. 92:9-13). If only traditional drug screening technologies are applied, potentially valuable molecules such as these can never be discovered. Additionally, plant and vertebrate microbial symbionts can sometimes independently biosynthesize bioactive compounds originally discovered from the host plant. In fact, in many cases (e.g. taxanes), the symbiont population produces a much wider range of related compounds. It is believed that similar biosynthetic pathways exist in both host and symbiont as a result of horizontal gene transfer. Thus, symbiont microorganisms represent a virtually untapped source of novel natural products, but only if new methods are made available that can overcome the limitations of conventional methods in fermentation/culturing and discovery of compounds from environmental microorganisms.

[0014] Moreover, a lead compound discovered through random screening rarely becomes a drug, since its potency, selectivity, bioavailability or stability may not be adequate. Typically, a certain quantity of the lead compound is required so that it can be modified structurally to improve its initial activity. However, current methods for synthesis and development of lead compounds from natural sources, especially plants, are relatively inefficient. There are significant obstacles associated with various stages of drug development, such as recollection, growth of the drug-producing organism, dereplication, strain improvement, media improvement, and scale-up production. These problems delay clinical testing of new compounds and affect the economics of using these new sources of drug leads.

[0015] At present, the above-mentioned marine, botanical and animal sources of natural products are underused. Currently available methods for producing and screening lead compounds cannot be applied efficiently to these under-explored sources. Unlike some terrestrial bacteria and fungi, these drug-producing organisms are not readily amenable to industrial fermentation technologies. Simultaneously, the pressure for finding novel sources for drugs is intensified by new high-efficiency and high-throughput screening technologies. Therefore, there is a general need for methods of harnessing the genetic resources and chemical diversity of these as yet untapped sources of compounds for the purpose of drug discovery. Discovery through microbial symbionts offers one possibility if methods can be developed that overcome limitations inherent in conventional discovery from environmental microorganisms.

[0016] Most recent drug discovery programs have shifted to mechanism-based discovery screens. Once a molecular target is identified (e.g., a hormone receptor involved in regulating the disease), assays are designed to identify and/or synthesize therapeutic agents that interact at a molecular level with the target.

[0017] Gene expression libraries are used to identify, investigate and produce the target molecules. Expression cloning has become a conventional method for obtaining the target gene encoding a single protein without knowing the protein's physical properties.

[0018] Many proteins identified by screening gene expression libraries prepared from human and mammalian tissues are potential disease targets, e.g., receptors (Simonsen et al. 1994, Trends Pharmacol Sci 15:437-441; Nakayama et al. 1992, Curr Opin Biotechnol 3:497-505; Aruffo, 1991, Curr Opin Biotechnol, 2:735-741), and signal-transducing proteins. See Seed et al., 1987, Proc Nati Acad Sci 84:3365-3369; Yamasaki et al., 1988, Science 241:825-828; and Lin et al., 1992, Cell 68:775-785, (type III TGF-β receptor) for examples of proteins identified by functional expression cloning in mammalian cells.

[0019] Once a disease target is identified, the protein target or engineered host cells that express the protein target have been used in biological assays to screen for lead compounds (Luyten et al. 1993, Trends Biotechnol 11:247-54). Thus, within the scheme of drug discovery, the use of gene expression libraries has been largely limited to the identification and production of potential protein disease targets. Only in those instances where the drug is a protein or small peptide, e.g., antibodies, have expression libraries been prepared in order to generate and screen for molecules having the desirable biological activity (Huse et al. 1991, Ciba Foundation Symp 159:91-102).

[0020] However, there are other applications of gene expression libraries that are relevant to drug discovery. Gene libraries of microorganisms have been prepared for the purpose of identifying genes involved in biosynthetic pathways that produce medicinally-active metabolites and specialty chemicals. These pathways require multiple proteins (specifically, enzymes), entailing greater complexity than the single proteins used as drug targets. For example, genes encoding pathways of bacterial polyketide synthases (PKSs) were identified by screening gene libraries of the organism (Malpartida et al. 1984, Nature 309:462; Donadio et al. 1991, Science 252:675-679). PKSs catalyze multiple steps of the biosynthesis of polyketides, an important class of therapeutic compounds, and control the structural diversity of the polyketides produced. A host-vector system in Streptomyces has been developed that allows directed mutation and expression of cloned PKS genes (McDaniel et al. 1993, Science 262:1546-1550; Kao et al. 1994, Science 265:509-512). This specific host-vector system has been used to develop more efficient ways of producing polyketides, and to rationally develop novel polyketides (Khosla et al., WO 95/08548).

[0021] Another example is the production of the textile dye, indigo, by fermentation in an E. coli host. Two operons containing the genes that encode the multienzyme biosynthetic pathway have been genetically manipulated to improve production of indigo by the foreign E. coli host.(Ensley et al. 1983, Science 222:167-169; Murdock et al. 1993, Bio/Technology 11:381-386). Overall, conventional studies of heterologous expression of genes encoding a metabolic pathway involve cloning, sequence analysis, designed mutations, and rearrangement of specific genes that encode proteins known to be involved in previously characterized metabolic pathways.

[0022] In view of numerous advances in the understanding of disease mechanisms and identification of drug targets, there is an increasing need for innovative strategies and methods for rapidly identifying lead compounds and channeling them toward clinical testing.

[0023] The speed and availability of automated nucleic acid synthesis has led to rapid technological advances in biological research. For example, the availability of synthetic primers for sequencing has permitted researchers to decrease their time and labor involved in sequencing a particular nucleic acid by approximately sixty percent. Another technology which is facilitated by synthetic oligonucleotides is the polymerase chain reaction (PCR). This technique, which involves the exponential amplification of sequences between two synthetic primers, offers unprecedented detection levels and permits genetic manipulation of the amplified sequence. Further, the availability of synthetic primers allows a variety of genetic manipulations to be performed with relatively simple procedures, including site-specific mutagenesis and the custom design of genetic vectors.

[0024] Sequences to be cloned are also routinely modified with synthetic oligonucleotides. The modifications of either vector or insert sequence can range from the addition of a simple sequence encoding a restriction enzyme site to more complicated schemes involving modifying the translation product of the cloned sequence with a specific peptide or a variety of peptide sequences. Thus, these technological advances associated with synthetic oligonucleotides has afforded researchers many opportunities to study diverse biological phenomenon in greater detail and with greater speed and accuracy.

[0025] Oligonucleotide synthesis proceeds via linear coupling of individual monomers in a stepwise reaction. The reactions are generally performed on a solid phase support by first coupling the 3′ end of the first monomer to the support. The second monomer is added to the 5′ end of the first monomer in a condensation reaction to yield a dinucleotide coupled to the solid support. At the end of each coupling reaction, the by-products and unreacted, free monomers are washed away so that the starting material for the next round of synthesis is the pure oligonucleotide attached to the support. In this reaction scheme, the stepwise addition of individual monomers to a single, growing end of a oligonucleotide ensures accurate synthesis of the desired sequence. Moreover, unwanted side reactions are eliminated, such as the condensation of two oligonucleotides, resulting in high product yields.

[0026] In some instances, it is desired that synthetic oligonucleotides have random nucleotide sequences. This result can be accomplished by adding. equal proportions of all four nucleotides in the monomer coupling reactions, leading to the random incorporation of all nucleotides and yields a population of oligonucleotides with random sequences. Since all possible combinations of nucleotide sequences are represented within the population, all possible codon triplets will also be represented. If the objective is ultimately to generate random peptide products, this approach has a severe limitation because the random codons synthesized will bias the amino acids incorporated during translation of the DNA by the cell into polypeptides.

[0027] The bias is due to the redundancy of the genetic code. There are four nucleotide monomers which leads to sixty-four possible triplet codons. With only twenty amino acids to specify, many of the amino acids are encoded by multiple codons. Therefore, a population of oligonucleotides synthesized by sequential addition of monomers from a random population will not encode peptides whose amino acid sequence represents all possible combinations of the twenty different amino acids in equal proportions. That is, the frequency of amino acids incorporated into polypeptides will be biased toward those amino acids which are specified by multiple codons.

[0028] To alleviate amino acid bias due to the redundancy of the genetic code, the oligonucleotides can be synthesized from nucleotide triplets. Here, a triplet coding for each of the twenty amino acids is synthesized from individual monomers. Once synthesized, the triplets are used in the coupling reactions instead of individual monomers. By mixing equal proportions of the triplets, synthesis of oligonucleotides with random codons can be accomplished. However, the cost of synthesis from such triplets far exceeds that of synthesis from individual monomers because triplets are not commercially available.

[0029] It would therefore be useful to develop a method for synthesizing oligonucleotides which are designed for hybridizing to genes coding for bioactive compound coding genes, antibiotics, and secondary metabolites.. The present invention satisfies these needs and provides additional advantages as well.

SUMMARY OF THE INVENTION

[0030] According to the present invention, there is provided a method of targeted cloning and enrichment of genes and gene clusters. This is accomplished by directly cloning the target gene from the source DNA using one of several novel methods presented, for example by creating template derived primers containing target oligonucleotides, adding these template derived primers to a sample of DNA and performing PCR to replicate those genes targeted by the template derived primers. The methods provide the degenerate cloning of the entire family of related target genes from a mixed DNA sample. This collection of related genes is then used to affinity purify and clone larger target gene containing fragments from the sample, representing associated biosynthetic pathway genes. The result is a target gene/pathway enriched genomic library. Also provided are the genes provided by this method and the probes used in connection with this method. These are also useful for hybridization screening of clonal libraries as well as culture collections.

DESCRIPTION OF THE DRAWINGS

[0031] Other advantages of the present invention will be readily appreciated as the same becomes better understood by reference to the following detailed description when considered in connection with the accompanying drawings wherein:

[0032]FIG. 1 is a photograph showing a gel of the results of the degenerate nested pair PCR reaction for cloning the DHFR2 gene probe from marine sediment DNA; Lanes 10-15 are the products from the first PCR using primers DHFR2-1 and DHFR2-4; Lanes 3-8 are the products from the second PCR using products from the first PCR as template and primers DHFR2-2 and DHFR2-3; Lane 9 contains size markers; Reaction conditions were as specified in the text; The expected product size is about 120 bp as seen in lanes 8 and 4;

[0033]FIGS. 2A, B, and C illustrate the strategy for generating template specific primers and their use in specific cloning of unknown flanking sequences against a single known primer, details are discussed in the text;

[0034]FIG. 3 shows the PCR amplification of the part of Amp resistant gene from pBR325 as template using Bcgl derived primers and sequence specific pPstCW primer, the reaction mixture contains pBR325 digested with BamHl (50 ng) as template, 32-mer Bcgl primers from pBR325 (gel purified 12 pmol) and/or 32-mer sequence-specific primer pPstCW (40 pmol); Bcgl primers were denatured five minutes 100 C and immediately cool down on ice; For PCR used program AF08 (T1=96 C, t=30 seconds; T2 56 C, t=1 minutes; T3=72 C, t=10 seconds, reactions are carried out in 34 cycles); Lanes: 1. Mixture of Bcgl oligonucleotide with template; 2. Mixture of Bcgl oligonucleotides plus sequence specific pPstCW primer with template; 3. pPstCW primer with template but without Bcgl primers; 4. pPstCW primer plus Bcgl primers mixture without template; and 5. pBR325 (1 ug) digested with Bcgl restriction endonuclease;

[0035]FIG. 4 shows the purified single-stranded DNA form from phages run on 0.8% agarose gel (ethidium bromide stained); Lanes: 1. SS DNA M13 mp18 (1 ug); 2. SS DNA from M13 mp18 Bcgl library from E. coli HB101 (1 ug); 3. SS DNA from M13 mp18 Bcgl library from S. clavuligerus (1 ug); and 4. DNA marker lambda DNA digested HindIII (1 ug) (Promega, Wis.);

[0036]FIG. 5 shows a 0.8% agarose gel analysis of biotinylated polymerase elongation products (BEPEP); Panels: A ethidium bromide stained gel. B steptavidin-phosphatase assay from southern blotting. PCR and polymerase elongation reaction (PER) carried out in 20 ul reaction format contained 30 pmol biotinylated primers and 100 ng for PCR or 1 ug for PER DNA template either CsCl purified DNA from E. coli HB101 or DNA from S. clavuligerus; For improving elongation, reactions carried out with 5 u TaKaRa LA DNA polymerase according to the manufacturer's instructions (TaKaRa, Japan); Primers were labeled with photobiotin (Vector, Calif.) according manual; Lanes: 1. PCR from HB101 DNA, primers AA and AB (sequence-specific primers for PEPase gene of E. coli), 2. PCR from S. clavuligerus DNA with primers ACVS 010 and 011 (ACVS 04-011 are sequence-specific primers for ACVS gene of S. clavuligerus); 3. Negative control PCR primers AA and AB without template; 4. biotinylated DNA marker lambda digested HindIII (0.5 ug); 5. Negative control PCR primers ACVS 010 and 011 without template; 6. BPEP with AA primer and DNA HB101; 7. BPEP with AB primer and DNA HB101; 8. BPEP with ACVS 04 primer and DNA S. clavuligerus; 9. BPEP with ACVS05 and DNA S. clavuligerus; 10. BPEP with ACVS08 and DNA S. clavuligerus; 11 BPEP with ACVS 09 and DNA S. clavuligerus; 12. BPEP with ACVS 010 and DNA S. clavuligerus; and 13. BPEP with ACVS 011 and DNA S. clavuligerus;

[0037]FIG. 6 shows a blot of a streptavidin-phosphatase assay of binding of biotinylated polymerase elongated products (BPEP) to Avidin D beads (Vector Labs, Calif.); Lanes: 1. BPEP with primers AA and AB from DNA E. coli HB101 of. 2. BPEP with primers ACVS 04-011 from S. clavuligerus DNA. Panels: A BPEP before adsorption to beads, B BPEP fraction unbound to beads, C BPEP fraction incubated at 65 C in TBST buffer; 50 ul (25 ng/ul) Avidin beads equilibrated two times with three volumes of TBST buffer and mixed with 100 ul BPEP and incubated 2 hours at 37° C.; Then beads washed three times with three volumes of TBST buffer and 2 ul analyzed on streptavidin-phosphatase dot blotting assay;

[0038]FIG. 7 shows a photograph of an agarose gel analysis of bound and unbound fractions of SS DNA form M13 mp18 Bcgl libraries; Lanes: 1.,7. DNA marker lambda HindIII (1 ug); 2. M13 mp18 original Bcgl library from E. coli HB101 (10 ug); 3. Unbound fraction of HB101 Bcgl library at 37° C.; 4. Bound fraction of Bcgl HB101 library at 37° C.; 5. Unbound fraction of Bcgl HB101 library at 65° C.; 6. Bound fraction of Bcgl HB101 library at 65° C.; 8. M13 mp18 original Bcgl library from S. clavuligerus (10 ug); 9. Unbound fraction of Bcgl S. clavuligerus library at 37° C.; 10. Bound fraction of Bcgl S. clavuligerus library at 37° C.; 11. Unbound fraction of Bcgl S. clavuligerus library at 65° C.; 12. Bound fraction of Bcgl S. clavuligerus library at 65° C., 10 ul (1 ug/ul) of M13 mp18 SS DNA library mixed with 50 ul of Avidin D-BPEP (biotinylated polymerase elongated product) in 100 ul of TBST buffer and incubate overnight at 55° C.; The temperature was then decreased to 37° C. for 10 minutes and 100 ul of unbound fraction was collected, The beads were washed three times with three volumes of TBST buffer for five minutes, bound fraction was eluted with 100 ul of water by boiling at 100° C. for five minutes; All fractions were ethanol precipitated and dissolved in 30 ul water; 10 ul was analyzed by agarose gel electrophoresis and 3 ul was electrotransformed into Nova Blue E. coli cells (Novagene, Wis.);

[0039]FIG. 8 is a photograph of an agarose gel used to analyze the PCR amplification of S. clavuligerus DNA with sequence-specific ACVS and captured Bcgl primers; PCR reactions carried out in 20 ul format with 100 ng of S. clavuligerus genomic DNA as template and 30 pmol primers; Lanes: 1. ACVS 04 plus 011; 2. ACVS 09 plus 011; 3. ACVS 08 plus 010; 4. ACVS 010 plus 011; 5. DNA marker lambda HindIII; 6. ACVS 04 plus 04w3bcg; 7. ACVS 04 plus 04w6bcg; 8. ACVS 04 plus 04w9bcg; 9. ACVS 04 plus 04w10bcg; 10. ACVS04 plus 04w13bcg; 11. ACVS09 plus 04w3bcg; 12. ACVS09 plus 04w6bcg; 13. ACVS09 plus 04 w9bcg; 14. ACVS09 plus 04w10bcg.; and 15. ACVS09 plus 04w13bcg;

[0040]FIG. 9 is a photograph of an agarose gel showing the PCR amplification products of using octamer primers calculated using the k-tuple strategy as described in the text; Template used is HB101 genomic DNA and otherwise standard conditions; Lanes 1, 1′, 18′ contain size markers; 2-9, oct03 as a solitary primer with varying buffer compositions (lanes 10-18 are empty); 2′-9′ standard primers for the phosphoenol pyruvate gene as a control; 10′-18′ oct01 as a solitary primer with the same varying buffer as in lanes 2-9 Products are of expected size for a random PCR (0.2-3 kb);

[0041]FIG. 10 is a photograph of an agarose gel comparing the PCR amplification products from FIG. 9; Lanes: 1, markers; 2, oct01; 3 oct03: Reactions were conducted under optimized conditions as judged from analysis of reactions shown in FIG. 9;

[0042]FIG. 11 is a photograph of an agarose gel showing the PCR products using k-tuple generated primers pair-wise with primers specific for the acvs gene; S. clavuligerus genomic DNA was used as template, under otherwise standard cycling conditions and temperature gradient (each primer pair PCR was conducted at 27° C., 34° C., and 42° C., left to right across the gel); Lanes: 1, size markers; 24, ACVS05 and oct01; 5-7, ACVS05 and oct02; 8-10 ACVS07 and oct01; 11-13, ACVS07 and oct02; Controls confirmed that amplification was due to pair-wise priming of specific and octamer primers, and not solitary priming by either primer alone;

[0043]FIG. 12 is a photograph of a hybridization blot analysis of a streptavidin-phosphatase assay of different fractions of biotinylated PCR probes during purification on Avidin DLA beads; Panels: A) original mixture of PCR probes 2 ul (50 ng/ul); B) unbound fraction of non-biotinylated PCR probes 2 ul (50 ng/ul); C) biotin eluted fractions of biotinylated PCR probes 2 ul (10 ng/ul); Lanes: 1. Bio IPNS 05+06 PCR product; 2. Bio StsC03+04 PCR product; 200 ul of Avidin DLA beads were used for purification (capacity 25 ng/ul);

[0044]FIG. 13 is a photograph showing a 1% agarose gel electrophoresis analysis of Avidin DLA purified biotinylated PCR probes; Lanes: 1. Bio IPNS 05+06 5 ul (10 ng/ul); 2. Bio StsC 03+04 5 ul (10 ng/ul); Panels: A) ethidium bromide stained gel; B) streptavidin-phosphatase assay from the southern blotting of the gel;

[0045]FIG. 14 is a photograph of the results of screening pFD666 and pSCOS1 S. griseus genomic libraries, enriched for aminoglycoside genes, with alk-direct labeled StsC03+04 probes; Panels: A) original library (total of 500 colonies on the plate); B) StsC and recA captured library after eletrotransformation (total 250 colonies on plate); C) library derived from StsC and recA captured chromosomal DNA fragments cloned into pSCOS1 cosmid vector (total 2000 colonies on plate); Results demonstrate over a 100-fold enrichment for the specific gene, as compared to the expected number of positive clones in the unenriched library;

[0046]FIG. 15 is a photograph of several dot-blots of positive clones from libraries enriched for the acvs gene (left panel) and strB1 (right panel), corresponding to the beta lactam and aminoglycoside biosynthetic clusters, respectively; Genomic libraries were constructed from S. clavuligerus (acvs) and S. griseus (strB1); DNA from positive clones frequently hybridized with several additional gene probes associated with their respective clusters (Table II), demonstrating the cloning of intact clusters and entire pathways;

[0047]FIG. 16 is a photograph of an agarose gel used in the PCR analysis of several clones enriched for the aminoglycoside cluster (FIG. 14); Lane 1 to 30: 1^(st) PCR, using E. coli cells with different plasmids as template; Lane 1′ to 30′: 2^(nd) PCR, using 1^(st) PCR products as template; 1: MW marker (100 bp ladder); 2 to 8: 1^(st) PCR using StrD primers; 9 to 15: 1^(st) PCR using StsA primers; 16: MW marker (100 bp ladder); 17 to 23: 1^(st) PCR using StrB1 primers; 24 to 30: 1^(st) PCR using StsC primers; 1′: MW marker (100 bp ladder); 2′ to 8′: 2^(nd) PCR using StrD primers; 9′ to 15′: 2^(nd) PCR using StsA primers; 16′: MW marker (100 bp ladder); 17′ to 23′: 2^(nd) PCR using StrB1primers; 24′ to 30′: 2^(nd) PCR using StsC primers, each set of seven lanes with same primer follows the same pattern of order: (1) no template; (2) PDF666 as template; (3) B1-1 as template; (4) B3-1 as template; (5) B20-2 as template; (6) B20-4 as template; (7) B16str5 as template; These results confirm the retention and stable cloning of the cluster in many clones and corroborates the hybridization results indicating the presence of these genes; Additionally, the utility of many of the oligos listed in Table I and used in double nested PCR as described herein is also demonstrated;

[0048]FIG. 17 is a photograph of antibiotic selection plates demonstrating the heterologous expression of S. griseus aminoglycoside resistance in E. coli; Left Panel: gradient plates from 0 (bottom) to 25 (top) ug/ml streptomycin; The left side of each plate contains a spread from a single positive colony that hybridized to the strB1 gene probe; the right side of each plate contains the E. coil host transformed with cosmid containing no insert; Right Panel: plates contain the same clones as in the left panel; Plates contain 0, 5, 15, 25 ug/ml streptomycin, clockwise from the upper left plate; and

[0049]FIG. 18 shows yet another example of hybridization probing of genomic libraries using several of the gene probes; SFT4 is a library constructed from a trimethoprim resistant seawater isolate, carries the DHFR2 gene, and demonstrates antibacterial activity against S. aureus in standard antibiotic challenge assays.

DETAILED DESCRIPTION OF THE INVENTION

[0050] Generally, the present invention provides a method and probes for use in targeted cloning and enrichment of genes and gene clusters from an otherwise mixed and very diverse population of DNA. The methods provide the degenerate cloning of the entire family of related target genes from a mixed DNA sample. This collection of related genes is then used to affinity purify and clone larger target gene containing fragments from the sample, representing associated biosynthetic pathway genes. The result is a target gene/pathway enriched genomic library. Also provided are the genes provided by this method and the probes used in connection with this method. These are also useful for hybridization screening of clonal libraries as well as culture collections.

[0051] Genomics and bioinformatics can be used to identify specific genes and DNA sequences that correlate with the biosynthesis of specific structural classes of compounds, including many secondary metabolites. This is often conducted through a comparison of either the nucleotide gene sequences of known related genes or the protein sequences of the gene products through multiple sequence alignments. Constant or conserved regions within related sequences are thought to be important for protein function and will also be conserved in undiscovered genes of the related class. Cloning the entire population of target genes coding for a specific function allows for the associated, clustered biosynthetic pathways to also be cloned in a very specific and targeted manner (see below). Additionally, using degenerate PCR cloning permits the cloning of both closely as well as distantly related genes within a specific target class, subsequently permitting the cloning and capture of the entire genetic and chemical diversity for the target compound class of interest.

[0052] Degenerate-nested temperature gradient PCR is used for the successful cloning of the majority or even entire population of related genes from a mixture of many genomes and otherwise unrelated DNA, such as the total DNA isolated from a sample of soil or other environmental source. Nested sets of degenerate PCR primers have been designed for a variety of target genes (see TABLE I).

[0053] Several oligonucleotide PCR primers and hybridization probes were designed and then synthesized to target DNA sequences from a variety of sources that potentially contain bioactive compound coding, or resistance genes. The design of each oligo was conducted based on the alignment of sequences of the gene and/or protein family of interest that are available publicly (i.e. through GenBank). Several sequences were used, if available. In some cases, only a single unique sequence was available and used in calculating the oligo sequences. Most oligos were designed as degenerate nested pairs in order to maximize their capacity for the cloning and discovery of both closely, as well as distantly related novel sequences that, likewise, code for novel proteins and enzymatic products, such as secondary metabolites useful as lead drug compounds for screening.

[0054] The general method used for cloning target genes using degenerate nested temperature gradient PCR uses the following steps. First, a temperature gradient for the 1^(st) PCR is established having a range of temperatures from 41-60° C. This is accomplished using types of buffers having a pH of 8.3-9.2, MgCl₂ (1.5-3.5 mM), KCl (25 & 75 mM) (Stratagene PCR Optimization Kit). A volume range between 10-30 ul per reaction is placed in a 0.2 ml tube for cycles between 30-35. This is ten times diluted and the 1^(st) PCR products are used as templates for a 2^(nd) PCR reaction. The 2^(nd) PCR occurs at 52° C. and the other conditions are same with 1^(st) PCR. A gel the then run and expected size of product is cut. DNA is extracted from the gel by using gel extraction kit (Qiagen). The PCR product is cloned into a pT7Blue-3 vector (Novagen), based on manufacturer's protocol. Clones containing the target PCR product are screened by PCR and/or dot-blot hybridization. The plasmid is then purified. Automated sequencing is done using a Thermo Sequenase Cy5.5 terminator cycle sequencing kit (Amersham) and 50-250 fmol template with M13 forward or M13 reverse primers (2 pmol each).

[0055] The new sequence is aligned and compared with consensus target genes to confirm the degree of uniqueness by performing a BLAST search and sequence analysis.

[0056] In all cases, using these degenerate primers in the following way improves amplification significantly and reduces the number of unrelated misprimed products. This is a problem when it is otherwise desirous to sequence directly PCR products in the discovery of new genes. Most misprimed products can be eliminated by conducting a degenerate, limited-degenerate, nested PCR (DLDN-PCR). The first PCR can be highly degenerate, which aids in the potential discovery of distantly related genes. However, this also results in more unrelated amplification products. The result is clearly seen on an agarose gel of the PCR reaction, where it is seen that the expected product band is rather diffuse, and there is the existence of products of unexpected sizes (FIG. 1, lanes 14, 15). Analysis of this band on a 10% acrylamide gel reveals the presence of several bands, each varying slightly in size, and presumably sequence. However, conducting a second PCR using the gel purified product band or first PCR reaction mixture as template and a nested set of less degenerate primers results in amplification of only specific template targets (FIG. 1, lanes 4, 8). In this case, a 150 bp product is clearly resolved and amenable for purification and sequencing. This is because the chances for an unrelated misprimed products also containing another mispriming site (in addition to the first which misprimed in the first PCR) is very remote. This was confirmed by cloning and sequencing the PCR products from a single, yet diffuse DHFR2 band of about 150 bp. Although the cloned sequences all contained the priming sites, a much smaller percentage contained any related DHFR2 sequence bounded by the primer sites. Thus, a second PCR results in amplification of only the truly related molecules from the population of products. Cloning and sequencing the products from this second PCR demonstrates this “filtering” effect. The result is a reliable strategy for generating degenerate PCR products amenable for direct sequencing. However, if the number of specific products is high, as is sometimes the case for relatively common amplicons in environmental samples, then cloning the PCR products results in a large number of clones with specific product for sequencing.

[0057] Using the above described primers (Table I) and DLDN-PCR, there was discovered and sequenced several unique genes from marine and terrestrial microbial genomes related to a wide variety of biosynthetic pathways and structural classes of compounds, including antimetabolites, beta-lactams, polyketides, other antibiotics, taxanes, and others.

[0058] An example of this method involves a SC16RA01 probe which was generated using the “universal” 16S RNA PCR primers to amplify a 600 bp DNA product from S clavuligerus. This probe is useful for colony hybridization probing for Streptomycetes and other related high GC content genomes. Additionally, this probe has been used in the PCR amplification of similar genomic DNA from a heterogeneous population.

[0059] The resulting gene probes can be used for the discovery of either single genes or entire clusters of adjacent genes involved in the total synthesis of compounds of interest, for example secondary metabolite biosynthetic pathways, the products of which comprise very useful libraries for antibiotic and other therapeutic compound screening. This is especially promising since the relatively recently emerging picture of the clustering of secondary metabolite gene pathways on the bacterial and fungal chromosome.

[0060] The following adaptation of the present invention describes a method for the generation and use of highly specific PCR primers derived from the template itself. However, their sequence need not be known a priori. This adaptation also exploits some unique and novel properties of restriction endonucleases, using Bcgl as an example (FIG. 2).

[0061] Bcgl is a novel Type II restriction endonuclease originally isolated from Bacillus coagulans, and is now commercially available. The recognition sequence for Bcgl is shown in the following and consists of a specific 6 base pair site of DNA sequence. However, the enzyme cleaves outside of this recognition site and generates a 32 bp restriction fragment: 5′-/(N)₁₀CGA(N)₆TGC(N)₁₂/-3′ 3′-/(N)₁₂GCT(N)₆ACG(N)₁₀/-5′

[0062] Each restriction fragment is statistically unique in sequence and can be used as a specific oligonucleotide primer. The frequency of occurrence of the recognition site is the same as that for a random six base sequence, or about once every 4,000 nucleotides (i.e. ({fraction (1/46)}). However, the uniqueness of the fragment is extraordinary because it contains 34 nucleotides and corresponds to a randomized occurrence of once in 2.9×10²⁰ bases. Random sequence analysis has confirmed the uniqueness of these restriction fragments and that they are not merely a frequently occurring repeat. This provides the basis for these fragments serving as very specific PCR primers and hybridization probes, each fragment highly specific for its own recognition sequence. By digesting an entire genome, entire or partial chromosome, or mixture of many genomes, very specific primers can be produced with priming sites spaced approximately 4,000 bp apart along the template DNA, ideal for PCR amplification and cloning.

[0063] The library of these unique oligonucleotides that are produced from strain specific genomic DNA or a mixed population of environmental DNA can be used as a set of primers for PCR and in combination with gene-specific primers, can be used for amplification and cloning of neighboring regions of DNA surrounding specific genes. Therefore, this technique can also be used for cloning large segments of DNA adjacent to a specific target site, including complete bacterial operons, or biosynthetic pathway gene clusters from any organism. More broadly stated, this adaptation of the present invention can selectively and very efficiently (i.e. with high selectivity) amplify and clone from a mixture of DNA the regions flanking any specific target without any prior knowledge of the sequence to be cloned.

[0064] The simplest application of this adaptation of the present invention is to use the entire set of Bcgl template derived oligonucleotide primers in a PCR that also contains a target specific oligonucleotide. A model system has been developed with pBR325. Then, the method of the present invention was used to amplify a 300 bp fragment of the ampicillin resistance gene using a specific primer and a random mixture of template derived primers from a Bcgl digest of the pBR325 plasmid, which contains three Bcgl cleavage sites (FIG. 3). It was also determined that this method was effective with both linear and circularized template, using otherwise conventional PCR conditions.

[0065] An example of a more extensive and specific application of this adaptation of the present invention involves the identification and isolation from the entire Bcgl restriction digest of the single oligonucleotide containing the priming site most proximal to a specific oligonucleotide on the template to be amplified and cloned. The following steps describe the method of the present invention:

[0066] 1. DNA isolation and purification from bacterial strains or total environmental DNA from sources such as oil, water, etc. using known procedures such as guanidine thiocyanate, CTAB, cesium chloride gradient or their combination and/or modification;

[0067] 2. Digestion of isolated DNA with Bcgl endonuclease (NEB, protocol) and preparative purification of Bcgl 34-mer oligonucleotides using 15% PAGE or 2% agarose gel in combination with the QIAEX II purification system (Qiagen, Calif.) or any similar purification system.

[0068] 3. Construction of a 32-mer Bcgl oligonucleotide DNA library in M13 phage or any other phagmid vector, that does not contain Bcgl sites. Vector is first digested with Smal, EcoRV or any other blunt-end producing unique restriction endonuclease followed by phosphatase (CIP) treatment. The purified 32-mer Bcgl oligonucleotides are treated with Klenow fragment of DNA polymerase I or T4-DNA polymerase in conjunction with polynucleotide kinase in the absence of any dNTP, but in the presence of ATP in order to convert 3′-protruding ends generated by Bcgl restriction endonuclease to blunt ends appropriate for cloning. The Bcgl restriction fragments are now 32 bp. Equimolar concentrations of the vector and blunt ended 32-mer oligonucleotides are ligated using T4 DNA ligase, followed by transformation into any conventional specific strain of E. coli (JM101, TG1 or ER2267) by chemical transformation or electroporation using conventional protocols;

[0069] 4. The library of phages is washed out from the agar plates following transformation and single stranded DNA is purified by standard methods (FIG. 4).

[0070] 5. Specific primer (specific probe for the gene of interest) is labeled with biotin either at the 5′-end or randomly using a biotin labeling system (Vector Labs), or any other labeling system (e.g. fluorescein).

[0071] 6. A single stranded, labeled copy of the sequence to be cloned is produced as follows. Annealing and elongation of the labeled, gene specific primer with genomic DNA template produces a single-stranded, biotinylated copy of the DNA of interest, including sequence downstream and flanking the known region, that which contains the annealing site of the gene specific probe (FIG. 5). The biotinylated copy of DNA is isolated by absorption onto an avidin or streptavidin containing matrix, such as Avidex (Vector Labs, Calif.) or any other affinity matrix (FIG. 6);

[0072] 7. Single stranded oligonucleotide DNA from the phage library (step 4) is then hybridized with single stranded biotinylated DNA under appropriate conditions. All non-specific phase DNA is then washed out and only phase DNA containing complementary sequences will hybridize to the biotinylated DNA. A subsequent boiling procedure releases the single-stranded phase DNA that can then be amplified via retransformation into E. coil and amplified in vivo (FIG. 7).

[0073] 8. The phage library can be used for the generation of second primers either with PCR of the polylinker region or by Bcgl digestion; Repeating steps 5-7 results in a nested set of PCR primers that can be used to amplify an entire biosynthetic pathway;

[0074] 9. The phage library is used for generation of second primers for PCR. Of many ways this can be accomplished, two examples were described. First, the 32 bp region of insert was sequenced directly and this sequence was used for oligonucleotide synthesis. As an example, this yielded several primers, including GGGTCCGGCAGACCGTTCGCGGGCCGGAC, GAGCGGACCGCACCGCGATCGGAACAACCT, TCTCCGGGGCAGCGCGGTCGCGGAACGT.

[0075] A BLAST search confirmed their relatedness with the genus Streptomyces, as expected. Second, the polylinker region containing the 32 bp insert (desired PCR cloning primer) was amplified by PCR using the M13 universal and M13 reverse primers, generating a 184 bp PCR product. The 32 bp Bcg I fragment is flanked with unique EcoRI and BamHI restriction endonucleases sites and restriction with these enzymes was used to generate a 52 bp fragment, which was subsequently converted into a set of nested single-stranded oligonucleotides by treatment with Exolil nuclease under standard conditions. Each oligonucleotide in this nested set has the same 5′-end but a different level of deletion at 3′-end. Therefore, it can be used for PCR cloning as a second primer against a specific target primer. This approach can be used to clone several full length genes, operons, and entire biosynthetic pathways, as demonstrated in the next step.

[0076] 10. This method was used to clone a region flanking a specific priming site within the acvs gene of S. clavuligerus. Combination of the first primer (gene specific) and secondary primers in a PCR results in the generation of sequences flanking that of the gene specific primer annealing site (FIG. 8). Additional sequences can be subsequently cloned and combined into one operon for expression of proteins that produce secondary metabolites of interest (for example antibiotics).

[0077] Yet another adaptation of the present invention describes a method for the generation and use of highly specific PCR primers with frequently occurring priming sites across a wide range of genomes. These primers are novel and very useful for cloning sequences flanking a target sequence with no prior knowledge of the sequence to be cloned. This set of primers, collectively, is useful as a universal primer library, with specificity based on the criteria used in its generation.

[0078] This adaptation of the present invention relies on the analysis and interpretation of DNA sequences from a variety of genera, searching for relatively long and frequently repeated sequences across a wide range of genomes.

[0079] As an example, a genomic analysis of bacterial DNA protein coding regions was conducted and subsequently a set of 21 universal priming octamer oligonucleotides was constructed based on a very high frequency of repeating 8 base sequences. In addition, PCR conditions were optimized using various thermopolymerases, including the Stoffel polymerase fragment, and have demonstrated the ability of these “universal primers” to prime against specific target primers. A 10 base universal oligonucleotide primer library was also constructed, and the sequence analysis data reveals that virtually any length oligonucleotide set can be constructed. However, the frequency of occurrence decreases with increasing oligonucleotide length. However, long amplification PCR techniques make even these highly specific but less frequently binding oligonucleotide quite useful.

[0080] As an example, octamer and decamer oligonucleotide libraries were generated by performing a k-tuple search and analysis using a proprietary gene database. This database consisted of 15 genera representing 34 bacterial and 4 fungal species, and 38 protein coding genes. The species included in this database were represented in a weighted fashion based on the known/perceived frequency and importance of secondary metabolite production. Of the nearly 65,000 octamers calculated, only a subset of approximately 200, or approximately 0.3%, were frequently present within every or most of the genes included in the database, and thus useful for universal PCR cloning. For example, the octamer OS-OCT-003 with sequence CTCGCCGA occurs 30 times and at least once in nearly every species. This corresponds to a determined average frequency of once every 1,625 nucleotides, while the random frequency for an eight base sequence is only once every 65,000 nucleotides. Similar calculations based on k=10 resulted in a smaller number of equally frequent 10 base sequences, also useful for PCR primers. An example of 25 octamers and 12 decamers generated and used successfully for cloning are shown in Tables II and III TABLE II High Frequency Bacterial CDS Octamers Name Sequence, 5′-3′ Frequency OS-OCT-001 GTCGGCGA 30 OS-OCT-002 CCAGATCG 21 OS-OCT-003 CTCGCCGA 23 OS-OCT-004 CGACATCG 18 OS-OCT-005 GCCGATCA 17 OS-OCT-006 GCCACCGA 15 OS-OCT-007 GATGCCGA 17 OS-OCT-008 CGGCGAAG 19 OS-OCT-009 CGGCGAAC 19 OS-OCT-010 GGCGATCA 15 OS-OCT-011 GCCGAGGA 17 OS-OCT-012 CGCCGACA 17 OS-OCT-013 ATCGCCGA 13 OS-OCT-014 GGCGAACC 13 OS-OCT-015 GCCGACCA 14 OS-OCT-016 GCCAAGGA 15 OS-OCT-017 CGGCAACG 16 OS-OCT-018 GGCTGGAC 13 OS-OCT-019 GCAGCACC 14 OS-OCT-020 CCAGCCAG 16 OS-OCT-21 CGCCGCCG 39 OS-OCT-22 CGGCGACC 34 OS-OCT-23 CCGCCGCC 33 OS-OCT-24 CGCGGCCG 31 OS-OCT-25 GTCGGCGA 30

[0081] TABLE III High Frequency Bacterial CDS Decamers Name Sequence, 5′-3′ Frequency OS-DEC-001 CAGCTCGGCG 8 OS-DEC-002 GCCGGTGAGC 7 OS-DEC-003 CCGGGTCGAG 7 OS-DEC-004 GGCGCCGCCC 6 OS-DEC-005 GGCGCCGCCC 6 OS-DEC-006 CGAGGTCGAG 6 OS-DEC-007 CGAGCAGGCC 6 OS-DEC-008 CGACGCGGGC 6 OS-DEC-009 CCTGGCCGCG 6 OS-DEC-010 CCTGCGCGGC 6 OS-DEC-011 ACGGCCGCGG 6 OS-DEC-012 CGAGGACGTC 5

[0082] The specificity of these octamers and decamers toward bacterial protein coding sequences was confirmed by frequency analysis in mammalian DNA. The frequency in human DNA for each octamer was at least ten-fold less than in bacterial DNA, which was used as one criterion for selecting the octamers and decamers from the entire set generated. Additionally, a randomized search against known consensus sequences revealed no matches with most oligonucleotides generated. This confirms that these oligonucleotides are indeed novel, unique, and useful for specific universal cloning of bacterial DNA present in a mixture. Furthermore, both the presence and high-level frequency of several of these octamers were confirmed within several desired cloning sequences (e.g. S. clavuligerus ipns gene).

[0083] Using this method, PCR with the octamer set has been clearly achieved using E. coli HB101 genomic DNA (gDNA) as template (FIGS. 9 and 10). When used as solitary oligonucleotides in PCR reactions, the amplification products were observed in the size range of 0.2-3 kb, consistent with that predicted from the calculated frequency of the octamers. This demonstrates the utility of this octamer set for genotyping, in addition to cloning via amplification against a specific primer. A similar result has been demonstrated with S. clavuligerus gDNA used as template. Additionally, the ability to use these octamers as pair-wise PCR primers was demonstrated by amplifying a product using ACVS-04, a proprietary degenerate primer for the pcb A gene (FIG. 11).

[0084] The present invention is different from random priming and arbitrary priming in the following ways. Random priming is not specific for any type of DNA. Conversely, random primers are generally kingdom specific, as opposed to RAPD, (random amplified polymorphic DNA method) which is a DNA polymorphism analysis system based on the amplification of random DNA segments with single primers of arbitrary nucleotide sequence. Instead, the present invention uses primers specifically designed from thorough analysis of DNA databases, and the resulting oligonucleotides are universal for genomes included in the database. For example, in a method for RAPD PCR differentiation of Streptomyces species, none of the twelve 10-mer oligonucleotides matched the sequence of the over 65,000 oligonucleotides generated by the method for bacterial DNA amplification.

[0085] The use of the present invention also has a distinct advantage when the desired target sequence is derived from DNA of a mixed source, for example total purified DNA from soil. This population of total DNA will contain bacterial as well as fungal, plant, and potentially a host of many other contaminating DNAs, making it difficult to amplify specifically a product from a single group of the constituent DNA, such as that of a desired bacterial gene. However, a universal primer set constructed as described in this invention allows for universal priming of a specific subset of the total DNA population, only bacterial DNA in this example. For example, a specific bacterial gene can be amplified from a mixture of bacterial and mammalian DNAs using a single gene specific primer in conjunction with a universal library of oligonucleotides constructed as described in the present invention.

[0086] Another example of the utility of the present invention is demonstrated by using it to amplify against a specific primer in order to clone the region of gDNA flanking the specific primer annealing site. Streptomyces clavuligerus gDNA was used as template with specific ACVS, IPNS, and other specific primers to demonstrate the technique with high GC containing DNA.

[0087] The combined results from all methods described in the present invention for the direct cloning of unique target genes from marine and terrestrial microbial genomes is listed in Table IV. In summary, this collection contains 52 novel genes with homologies with the prototype gene or consensus sequence ranging between 35-90%. This includes a total of 10 classes of target genes, each gene within a class confirmed by sequencing.(See Table IV).

[0088] Other adaptations of this invention center around the optimization and refinement of environmental DNA isolation and purification; PCR conditions and additives including DMSO, formamide, and others, use of neutral base substitutions and tails incorporated into the primers (such as d-azaGTP in place of dGTP and inosine tails of 2-6 bases), and specific temperature cycling protocols; the construction and use of degenerate primers based on calculated universal primers, including the use of inosine in primers to increase length and annealing temperature; and the construction and use of labeled primers, such as biotinylation.

[0089] Cloned target genes representing the biosynthetic pathways or, in general, any flanking sequence, can be affinity purified from a diverse mixture of DNA, such as environmental DNA or total genomic library DNA. This includes both circular and linear. DNA. Subsequently, the entire captured fragment containing the target gene/pathway is cloned and propagated in a variety of expression/cloning host organisms and assayed for bioactivity based on the compound class of probe gene chosen. The method is based on RecA mediated homologous recombination and affinity chromatography.

[0090] Generally, the method consists of the following steps: i) biotinylation and affinity purifying the cloned probe gene; ii) reacting the biotinylated probe with diverse, mixed DNA containing sequences complementary to the probe; iii) capturing the hybrid probe: complementary fragments on an avidin support; iv) eluting the captured fragments; v) and molecular and/or biological cloning of fragments and propagation in any suitable host, such as E. coli or S. lividans.

[0091] Other uses of the novel cloned genes include hybridization screening, as exemplified abundantly by the data presented throughout this disclosure. For example, all probes/primers have been labeled with biotin and used successfully for the chemiluminescent discovery of novel target genes from southern blots of environmental DNA and genomic clones. Subsequent cloning and sequencing of these target genes was used to confirm that each probe bound specifically to its intended target. Thus, these probes are very useful (specific and sensitive) for the discovery and isolation of novel target genes, related gene clusters, and biosynthetic pathways.

[0092] The use of the DHFR2 oligos is especially promising for the discovery of novel folate antimetabolites, and their coding genes and gene products (biosynthetic enzymes). This approach is the only known source for the DHFR2 genes from which the oligos were generated as TMP resistant clinical isolates. TMP is a synthesized antibiotic and thus a search for a natural producer using genetic determinants for clinical resistance is quite novel (?). The DHFR2 oligo targets a unique form of DHFR protein that it unrelated to the chromosomal or other mutant forms that confer clinical resistance to TMP. Thus, the origin of this gene and protein have not been determined. DHFR2 can originate from a TMP-like biosynthetic pathway, conferring self-resistance to the producer. Following this model, the DHFR2 gene should be clustered within the entire TMP-like pathway. Thus, detection of the DHFR2 gene also provides the entire pathway within the regions directly flanking the gene. The results clearly demonstrate the utility of the method of the present invention have demonstrated the presence and possible origin of this unique gene in several environmental bacterial isolates, as judged by both colony hybridization probing, PCR, and sequence analysis of the gene.

[0093] ACVS04 (degenerate) and ACVS05 primers were used to PCR clone and sequence an approximately 500 base pair product from S. clavuligerus genomic DNA. This PCR was designed to generate 400 bp of known S. clavuligerus ACVS and 100 bp of new sequence of this gene. This strategy allows for assessing the accuracy of the sequence by comparison to a known sequence as well as generate new sequence. This confirmation allows for the routine use of the primers for generating new sequence directly from degenerate PCR products, a much more rapid approach than conventionally used.

[0094] DHFR2 has been used in the successful discovery and sequencing of several new DHFR genes. These genes confer resistance to TMP and other folate antimetabolites in WT as well as clinical isolates. Additionally, many of the WT strains produce novel folate antimetabolites.

[0095] The above discussion provides a factual basis for the use of the methods and probed of the present invention. The methods used with and the utility of the present invention can be shown by the following non-limiting examples and accompanying figures.

EXAMPLES

[0096] General Methods:

[0097] General methods in molecular biology: Standard molecular biology techniques known in the art and not specifically described were generally followed as in Sambrook et al., Molecular Cloning: A Laboratory Manual, Cold Spring Harbor Laboratory Press, New York (1989), and in Ausubel et al., Current Protocols in Molecular Biology, John Wiley and Sons, Baltimore, Md. (1989) and in Perbal, A Practical Guide to Molecular Cloning, John Wiley & Sons, New York (1988), and in Watson et al., Recombinant DNA, Scientific American Books, New York and in Birren et al (eds) Genome Analysis: A Laboratory Manual Series, Vols. 1-4 Cold Spring Harbor Laboratory Press, New York (1998) and methodology as set forth in U.S. Pat. Nos. 4,666,828; 4,683,202; 4,801,531; 5,192,659 and 5,272,057 and incorporated herein by reference. Polymerase chain reaction (PCR) was carried out generally as in PCR Protocols: A Guide To Methods And Applications, Academic Press, San Diego, Calif. (1990). In-situ (In-cell) PCR in combination with Flow Cytometry can be used for detection of cells containing specific DNA and mRNA sequences (Testoni et al, 1996, Blood 87:3822.)

[0098] Recombinant Protein Purification

[0099] Marshak et al, “Strategies for Protein Purification and Characterization. A laboratory course manual.” CSHL Press, 1996.

Example 1

[0100] 1. Preparation Biotinylated Sequence-specific Probes.

[0101] 40 ul (100 pmol/ul) DNA primers StsC03, StsC04, IPNS05, IPNS06 were labeled with S-S photobiotin (Vector Inc., CA) according to the manufacturer's instructions. These primers were designed for the amplification of the stsC and ipns genes of S. griseus and S. clavuligerus, respectively. After n-butanol concentration and EtOH precipitation, the primers were diluted in 200 ul water (20 pmol/ul). PCR reactions were carried out from pT7 blue3 (Novagene) plasmids containing the previously cloned probe sequences pT7stsC (stsC from S. griseus) and pT7ipns (ipns from S. clavuligerus) as templates.

[0102] Reaction mixtures contained the following: 2 ul primer 1 (20 pmol/ul) (StsC03 or IPNS05), 2 ul primer2 (20 pmol/ul) (StSC04 or IPNS06), 2 ul template DNA (10-100 ng/ul) (pT7blue3/StsC/S.gr or pT7blue/IPNS/S.cl), 2 ul buffer, 2 ul dNTP mix (2 mM), 10 ul water, and 0.2 ul (0.2 u) Taql. Cycling was conducted as follows: five minutes at 95° C., 30 seconds at 98° C., 30 seconds 52° C., one minute 70° C., repeated 34 times. Finally, the reaction was heated to 70° C. for ten minutes followed by holding at 4° C. until analysis of products was performed.

[0103] 10 ug of the mixture PCR product was purified on Avidin DLA beads (Vector) to separate biotinylated and non-biotinylated probes. The yield of biotinylated probe was 4-5% (0.4-0.5 ug, FIGS. 12 and 13). The biotinylated fraction of the probe was used for RecA capturing and non-biotinylated probe was used for alk-direct labeling (Amersham) in hybridization screening.

[0104] 2. RecA Capturing of Specific Target Gene Containing Cosmids from pFD666/S.gr/library.

[0105] 5 ul (0.1 ug) of the biotinylated probe was denatured by incubating for ten minutes at 99° C. mix with 50 ul RecA buffer (25 mM TrisAc, pH 7.5, 10 mM MgOAc, 2 mM CoCl₂, 1 mM ATP, 2 mM ATPγS, 5 ul (2 ug/ul) RecA (NEB). The mixture was then incubated at 37° C. for 30 minutes. 2.5 ul (2 ug/ul) of the CsCl purified cosmid DNA was then added and incubated for an additional hour at 37° C. 5 ul (50 ng/ul) of lambda HindIII digested DNA was added to the mixture to remove excess RecA, incubated for ten minutes at 37° C. followed by the addition of 2 ug/ul Proteinase K and SDS 0.2%. Enzymatic digestion was carried out for 30 minutes at 37° C. and then reaction was stopped by adding PMSF (100 mM) to a final concentration of 3 mM.

[0106] The captured DNA was separated on Avidin DLA beads (20 ul) beads prepared according to the manufacturer's instructions. At the final step captured DNA was eluted with 100-200 ul (0.1M NaOH, 1 mM EDTA), EtOH precipitated, and dissolved in 20 ul water. DNA was electrotransformed into E. coli XL1 (Stratagene). Positive clones were detected by colony hybridization with alk-direct StsC probes (FIG. 14). DNA from positive clones were purified and verified by dot-blot, southern hybridization, PCR and bacterial growth on LB agar with streptomycin (20 ug/ul) plates.

[0107] 3. Direct Cloning of RecA. Captured DNA Fragments from S. griseus Chromosomal DNA.

[0108] RecA capture was carried out as described above, but instead of cosmid DNA, 5 ul (1 ug/ul) of chromosomal DNA from S. griseus digested with Mbol/Sau3Al and CIP was used. After RecA capturing and binding to the Avidin DLA beads, DNA was eluted with 200 ul 2.5 mM biotin and directly ligated into the pSCOS1 cosmid vector (Strategene). After packaging into lambda extracts clones were plated on LB agar with Amp (50 ug/ml) and Km (25 ug/ml). Positive clones were detected by colony hybridization with alk-direct StsC probes (FIG. 14). DNA from positive clones was purified and verified by dot-blot, southern hybridization, PCR and bacterial growth on LB agar with streptomycin (20 ug/ul) plates. Positive clones most often contain related pathway genes, as confirmed by additional hybridization with related gene probes, such as strD, strb, and stsC (FIG. 15), and PCR (FIG. 16). Additionally, heterologous expression of these genes is often observed, as judged by antibiotic resistance (FIG. 17) and HPLC chromatographic profiling of cell extracts and fermentation broths, further demonstrating the utility of this invention for expression cloning screening.

[0109] Results from both examples clearly demonstrate the advantages of targeted cloning in providing highly enriched libraries of specific genes and associated sequences, including biosynthetic pathways. Library enrichment of several hundred fold for specific genes and related biosynthetic pathways it has been demonstrated by use of the present invention (FIGS. 14, 18).

[0110] Throughout this application, various publications, including United States patents, are referenced by author and year and patents by number. Full citations for the publications are listed below. The disclosures of these publications and patents in their entireties are hereby incorporated by reference into this application in order to more fully describe the state of the art to which this invention pertains.

[0111] The invention has been described in an illustrative manner, and it is to be understood that the terminology which has been used is intended to be in the nature of words of description rather than of limitation.

[0112] Obviously, many modifications and variations of the present invention are possible in light of the above teachings. It is, therefore, to be understood that within the scope of the described invention, the invention can be practiced otherwise than as specifically described. TABLE I List of example degenerate PCR primers of target gene cloning Gene Function Pathway Name Sequence (5′ to 3′) aac6′ Aminoglycoside 6′-N-acetyltransferase aminoglycoside AAC6001 CGATGCTSTAYGARTGGCTA AAC6002 TGGCGYGTYTGVACCATGTA AAC6003 CCGACRCTYGCKGACGTACA AAC6004 CATCBGGSGTSGTTACGGTA acvs alpha-aminoadipyl-L-cysteinyl-D-valine beta-lactam ACVS3 TGCTSGTSGGSGARGAGCTGA synthetase ACVS4 TCSACYTTGCCRTTGACRTTGA ACVS5 TTCACCGARRCSGCGTTCGTCA ACVS6 CGCCGGSACCATSAYCCGGATSA dhfr2 trimethoprim resistance DHFR2-1 GATCGCGTGCGCAAGAAAT DHFR2-2 CCACSTTTGGYHTRGGRGATCG DHFR2-3 AYRCGTTCAAGYGCMGCMACAG DHFR2-4 GATAAATYTGYACTGARCCK ipns isopenicillin N synthase beta-lactam IPNS3 TTCKSCGAGGACCACCCGMWGAT IPNS4 GGAAGTAGTCGTKSGTGAKGT IPNS5 GATGCACGAGGTBAACVTCT IPNS6 TCTGGWASAGSACGGTGATCA NIM Nitroimidazole resistance NI01 ATGCSRCGYAARCGGCARTTGT NI02 CGATRGCYTCYTTGCCYGTCAT NI03 CGTTGGCYCTTCATGGSGATGA NI04 CCTTTGGCTATYTCRGCYTCCAT PKS Polyketide synthase PKS1 GAGTTCGACGCSGVSTTCTT PKS2 GGTGTGNCCGATGTTGGACTT PKS3 GCARCGVCTCCTGCTSGAAA PKS4 TTGCTSGCRCCGTCCTGGTT strB1 Amidinotrasferase I aminoglycoside StrB1001 AGCGTSTTCGCSGTGGAGTA StrB1002 GTGCTTCTCSAGMAGCTTGA StrB1003 TSAAGGAGACCGARGASGAA StrB1004 CGSACCACSAGCAGGTTCA strD dTDP-glucose synthase aminoglycoside StrD1 CTTCTAYGSVCTGGAGGCCA StrD2 GAGGRGATCTGSSCCDTGCT StrD3 GTGGGMGACGGYTCSAARTT StrD4 TYGCGCAGCACGATGGAGAA strE dTDP-glucose dehydratase aminoglycoside StrE1 CACTACGTCCRSACCCTCCT StrE2 GCCAKBCCGTCGSYCAGGT StrE3 CSGYSTTCGTGCGGACCAA StrE4 CGGTCSKSGACGTRCTCCA stsA L-alanine: N-amidino-3-keto-scyllo- aminoglycoside StsA1 ATCGTSCCYGGRTASATGTT inosamine aminotransferase StsA2 GATGGARCGSCCSAGSACA StsA3 CTTSTCCCTSAAYCASTAYAAGA StsA4 GCGTTCCRGGYTCRAAGGAA stsC pyridoxal phosphate dependent aminoglycoside StsC1 GCMTSCCSSTSATCGAGGACT amidotransferase StsC2 GTCGAACCGGGSMSGGGTCCA StsC3 CGGCGCRYSGGRGTSTTCA StsC4 GTWCAGCGGGTTGKCGTTCA Taxol taxadiene synthase Taxol01 ATGATGTGGGTYTGSTCSAGA Taxol02 TTTYTCRTCSCCYGTRTTCAT Taxol03 TCCYGGYCCKGTSGTMATGAT Taxol04 ACCYTCSAASGCRTTYAACAT

[0113] TABLE IV Unique Cloned Genes Used as Probes % simi- % larity Clone Name or with Gene Function Accession No. prototype Sequence aac6′ Aminoglycoside AF034958, U59183, X60321, L25666, 6′-N-acetyltransferase S49888,S45954, U90945 LDSAA3-33 50.80% ccgacactcgcggacgtacag LDSAA3-51 49.50% ccgacgcttgctgacgtacagg LDSAA4-56 36.10% catctggggtggttacggtacag LDSAA4-57   48% ccgacgctcgcggacgtacatg SCDSAA2-49 40.60% catcgggggtggttacggtattc SCDSAA2-52 36.20% catcgggggtggttacggtataa SCDSAA3-85 48.60% ccgacgctcgcggacgtacatc dhfr2 Trimethoprim resistance K02118, X04128, A12434 pASDMN1 89.80% gatcgcgtgcgcaagaaatctg pASDMN2 87.10% gataaatatgcactgagccggg pASDMN5   89% gataaatctgtactgagcctgga NIM Nitroimidazole resistance X76948, X71443, X71444, X76948 LDSN1-12 45.50% cctttggctatttcggcttccatctc SCDSN1-22 49.10% cctttggctatttcggcttccatcg SCDSN1-28 48.10% cctttggctatttcggcttccatgc LMSN1-65 47.30% cgatggcctctttgcctgtcatttt LMSN1-86 48.90% cgttggcccttcatggcgatgat SYDSN1-83 47.60% cgttggctcttcatggggatgatg SYDSN1-88 48.10% cgttggctcttcatggggatgatc PKS Polyketide synthase AF007101, AB032367, M63677, AF016585, AF079138, U24241 LMSK1 42.20% ttgctggcgccgtcctggttggtt LMSK25 49.10% ttgctcgcgccgtcctggtttgcg strB1 Amidinotrasferase I X78973, Y00459, AJ006985, X78972 SCDSB31 47.60% aaggagaccgaagaggaaca SCDSB32 39.80% tgaaggagaccgcaggagga strD dTDP-glucose synthase AJ007932, AJ006985, Y00459, AF055579 LDSD2-16 ttgcgcacgacgatggagaact SCDSD1-30   49% caaagcggaagcgatgcggat SCDSD2-40 45.30% tgcgcacgacgatggagaaag SCDSD2-44 48.90% tcgcgcacgacgatggagaat acatgaacctggcgcttgggttagaagaagataccggtctagagcacttgaagtaggaggaggccggaaggggatggggaaaacgctctcacaggtgga- caagggaacgaggcagggtcttatcttaaggctgatctccgagaa cggaatgcagaacttccacgggggcattaaaacgttcacgaaaacgggcgatagtttgcggtgtcaggccgatttccggcaccatcaccagcgcctgtt- tgccctgagcgagcacgttttccagtacgctgagataaacctccgttttac gcaacttcacgccatctgccgagttcaacatctcttgccgacccagaagccgcacgcgtagtgttcacctccggcggttccattagtgatgatgggcct- cgatctcacaaccagaccgtttgcacccggacgtgattgctcggatggaaagg acaaacctttatttcaagattaaagaagataagcgcaaggctgcgagaggtgaataatgcctccatcacttacgcaaaagccgcttgctgctgctcatt- ggtggcgcgacgcaattgctcatagcactcacgtgttaatcactcggcccaa gaggcgatatcattttctacaggaatacgcaccaaagactcaatcagattgcgtccaacaagaccggcattg actgcggcttcttctttttcttctttcttcttgttacacgctgtaaacaacagaagactgcttagcgcaatacttgcgacaa ttcaagcgtccatcgctcggcatggtgatgtggatctggatcagcgtgatgaacccgcatacgcaagggtggggcttcgcgcgcgaagcgttcgccgcca- tcatcgcggtgacgacggtcgccgccatggccacgaacgcgtaccgg cgccgcctggcagggtcgagttgtcggctggtactgcacagatctgacccctgaaggctatgccgtcgagtccgagtctcaccccggctcagtacagatt- tatc gtgagactcggactgacggcatagccttcaggggtcagatttgtgcagtaccagccgacaactcgaccctgccaggcggcgccagatttcttgcgcacgc- gatc tgcgactcggactcgacggcatagccttcaggggtcagttttgtgcagtaccagccgacaactcgacctgccaggcggcgccagatttcttgcgcacgcg- atc gacgacagtgaagttctccctgtagaagaagtcgaggtacccctcgttctgaagaaatgtccctttgaccgtggaccgccttttggttatcgagcgcggc- gccataatccgagggtatgggggcgaggtcggcataggctggaacgcatt tggcctccttgcctgtcatccacttcttcaacagagatatttgagaaatcagaaatttctgtctttaaaggagatgtctggctgcgggaaccgatcatct- gtagctgtgttcttataatattctgaattttgcacgcttgtttcttctgcttttttctaaag cgtcatgacaccctcctggtgttcgtacaatttttcttttatcacctttgcgccctgttcttcttctacaccgtcaacggacttactaccatcggtaaat- ggccgcggcgtatcatattcgccctcttatttctcaaccctgccatcctttatctcaggtaac tcgatcactaccaccgggcgtgccagtcgtattgccagcgcctgtgccgtctcgccttgtgtctcaatcaataaaac gtcagcaccacgcctgtcgcggcgctcaaaagatagctggtggccgagcatgacgggaaacatgctgcgatcctgtgcgacacggcggatcagcgattcc- tgcgaaccgataccgcagatctggacgccaagattggtgacggccgt gtcagcaccacgcctgtcgcggcgctcaaaagatagctgtggccagagcatgacgggaaacatgctgcgatcctgtgcgacacggcggatcagcgattcc- tgcgaaccgataccgcagatctggacgccaagattggtgacggccgt tgccggcctgaggggctgcgcgcacggaggaaagataaggctcgtaggtcatggccgcgtcgttctggccggcgatgaaggcctgagcggcaggacccgg- ctccatgttgacgacggtcacgtccttcacggagagaccgttcttctt gcacattgccattacccattacgatggtaatcatcaccgcgatagcgcaaattgcaccgcctcctgcggctgttttcccttcataaagacctcataagcg- aatttttacgctccaggacaaacacccattcacagccaataccgactgactc agacgcaggaggcaagtcgcagtaccagtcgtagaagcttaagcaagtaccgccaatcagcgagagatagcgtgcacccgatgcgtaagaaaccatcgac- attgccggaattggcgagaaccagcaacacggtccgggccgct tgggctggaagatggaaaagccaaaaggaaccttttacgtatgtggcgtgtagaattccgagaaacgtttgagaacatctcaccaattctccgattactt- gctggagcatgctcatgtcgttgtcacaccgggcgaaaatattcggaatcatt acacggaaataagaactgatgtgctcgcaggaaataaagacacagggaaatatgatcatagatactcaaacattccttaactatagggagcagagcgagg- cattaaaggcctggcagaaatcaaatcctaaggaaggtgaatcatt gatttcgctgtcctcgatccggcagtcctcggagacggacgtgaacggaccgatgtaggcgtcgttgaccaccgtgccggcaccgatgatggcaggcccg- acgatgcggctgcacactgacgctggcgccgcctcgacccggaccc gttgcgggcaaaagtcgcggccttagcggcgaacaggctgctgatgaaaatgatcgaatggcgtcgaggaccagggtggcgtactggtcggataggatcg- ggctcgaggtggcgtacctataactttcccgaggaatcgcctggac ccggctatatggacgaagatttcttcctatatgccgaagaagtggagtggtgcagccgtttacgtaagctgggcgaattagcgatctttggagacatcaa- cattattcaccttcagggtgagaccaccggagacgcctttgactcagccgat tgggtatcgtgaaggtataattgaggaggagtataaagtaccagcgtgccccgaattgtttcctgaccctgacatgattggataacatgagctggaggcc- tttacggtcatacagggccgtagtgagccttatcggcgtgagtcaaagggc gcattggggaccagcaggatctggtcgcggccctgtcggaggctggggttgaggtggcccaggcgaccgtgagtcgggaccttgcgagctcggggtccta- aaggtcggtaaccgctatctccggc gtaaccacgcccgatgatcacgaattctggatccgatacgtaacgcgtctgcagcatgcgtggtaccgagctttccctatagtgagtcgtataga caggcggccgccggagagctgttcagcgacatcatgaacttcactctcaaaacgcagtcgaaaactacggccttgctggcggccggtgcacgacgccac tggtaaccgatggtccctggaagatgtccagccctaccataccatcaccaaagatattgttggtgtatggcactgtatgctcaccggacacaccggaaaa- gaccatcattgctccggtaa aataccgtaaccacccccgatgatcacgaattctggatccgatacgtaacgcgtctgcagcatgcgtggtacgagctttcctatagtgagtcgtatagagg tcggaaccaggtaggtgggttcccgggaggtggcctcggtcataggacaacgtccgaggatcattcacgtcgcccaatgggcggttggggccgt ataccagaatagcaaccaaaggcagcaagcagtacaacaactgccgtttggcgccgcatatctgaattcgtcgacaagcttcttgagcctaggcta tatatcaccaggtgacgtctatttcatcgccatgaagggcccacgatctgaattcgtcgacaaggcttctcgagcctagggcta atggaggcgtaaagcgcgaattgttcgacaccgagatgacgggcaaggaggccatcatcgccatgaagagccacgatcacgaattctggatcg attgaggccgaaatagccaaggatctgaattcgtcgacaaggcttctcgaggcctaggctagggctctaggaccacacgtggtggggggcccagctcgcgg- cgcacaattcactgc cagcatccaggagagggcgaaatagggcgacgtgccgggcgcggaggccgccacctgctggcccttgatgtccttgatggaggccgagatagccaaaggat- ctgaattcgtcgacaag atccctttagaagacacaggataatgcaaatcacttgttagctacgtttcaagatatacattattgctctaattaattatttttattagggatagataggt- ggaccat agtttttgatggtgtaacgttagatgcggcgatcagttcgttcacctcctgccaggagaacgaacaatccaccgccgtcacgcgcctgcttaaggctttgc- gct cggaaaaaggcatgtcagaatatcgatggtgtcgaagcaggaggatctgcgggaatttgtcatgcggattcaaagctgaacctgctcgtggtgcg accaactatttcaacaatatcagaattgaataagaaaaaatatattttgagaaattgccacaaaaagctgtctattttggacagcttttataaactactga- actgctagtggtgc ggccgatgatctcgctgctctcgtcgaccgtgccctcgaccaccggctccgacggtcctccaaggaccgaccggtggacctccaagcatgtcgggtaacgt- tgccggtgtccttccaggtagccgagagaaaggtccgttggaggcga gggatcatcgtaattgggtacaagttccaggaaacttgaccagagttctggctggcggacctaggtggatggtctaggacgcggctccatgccgataggtg- gagggcgtcggatggcacaacggccgaaggtcag aaggctactacggcctgtatgaccgtaaaggcctccagctcatgttatccaatacatgtcagggtcagaaacaattcggggcacgctggtacttat gtctccgggtgggtgctcacccgggggtgagtgatggtggatgtgcgccaaagagttcgggctaattggggcagcgttacggtggaacgggctgcgaggcac acgtacacggctggctgggtcaatacagcacactgtggatggcgtggtgggtgaaaattctatcaggctcggccggcgcacagagaccggctcatatatag- acgcaggacggcgctcttggtgaattgccggtgataaaaa SCDSD3-57 48.70% cgtcgtttgagcggacatgcgct SCDSD3-62   48% gcttgcgggtggagatgatcttg SYDSD1-19 45.80% cgcagcacgatggagaagtgt SYDSD3-52 48.30% cgcgcacgacgatggagaatc strE dTDP-glucose dehydratase AF055579, AJ006985, AJ007932, X62567 SCDSE1-79 51.80% tcggtcgggacgtgctccacac LMSE1-57 48.40% ccgcgttcgtgcggaccaactg LMSE1-63   49% catggataacgcctggcaggg LMSE1-68 47.40% cactacgtccgcgaccctcctg LMSE2-84 49.80% cggtgttcgtgcggaccaaaag SYDSE1-54 35.80% cggtccgggacgtgctccagac SYDSE1-55 37.70% cggtcgtggacgtgctccaggc SYDSE1-61 49.10% ccagaattcgtgatcggtgttcgt SYDSE2-66 35.30% cggtcggggacgtgctccagg stsA L-alanine: Y08763 N-amidino-3-keto-scyllo- LDSA3 47.70% gcgttccgggctcgaaggaaa inosamine aminotransferase LMSA1-6 47.20% gcgttccgggttcgaaggaagc LMSA1-9 47.40% gcgttccgggttcgaaggaagg LMSA1-17 48.80% gcgttccgggctcgaaggaata LMSA1-26 52.80% ggcggttccgggatcgaagga SCDSA2-18 44.00% cgtagagatggggtctctccatg SCDSA2-20   48% cagcgcggcagtgggtgggtta stsC Pyridoxal phosphate Y08763 dependent LMSC1-29 48.60% gttcagcgggttggcgttcagaa amidotransferase SYDSC1-1 50.20% gttcagcgggttgtcgttcatggc SYDSC3-22 46.10% gttcagcgggttggcgttcaggg Taxol Taxadiene synthase U48793 LDST1-81 49.50% tcccggtccggtggtcatgatcct SCDS1-33 40.80% accgtgtcgaaggcgtttaacat SCDS2-24 47.10% tcctggcccggtcgtcatgatgt SCDS2-25 50.20% accctcgaaggcgttcaacatc SCDS2-42   49% tcctggtccggtcgtaatgattcc aaaggcagtgaaattatccgcctggttgaagaaagcgatccggtagcggaactggcattgcgtcgctacgagctgcggctggcaaaatcgctggcactgtc- gtgaatattctcgatccggatgtgattgtcctggggggcgggatgag gcctcggggatcgttcatcgccaggatcaccggccttggacgtcggttcatttccaggctctggccaggaacatctgggtcttcggcgtcggcgaacagga- tgcggcggcctcggcggtattgcgctcgacatcaccgggtcggagtcg ggaacgtctgctctacaatgctttacgggcatcgatcaggatcagggaaacctgcgcattggaagcaccggttaccatgttacgggtatattcgatgtgac- ctggtgtatctgcaataatgtatttacggctgggggtggaaagtagatatg gatcgggtccgcttcaacgatctgttgattggcagcacagtctcgaaccggctcgaaggtgggaacggcaatgacaccttccgcggcacgccggagcagcg- tattgatcggtggtgacggcacgggcgacacggcagactattca gcgagcaacaactaccaggacaagcaccaggccctgtcccgctatgcgaacgtgatgacgtgcagccgcaccaaggtgccctggcgcccgggccgcggcta- caacagcagcgaaccgaagatctacggcttgcagaccgcca gcgatatagaacgggccagggcaggccgttggctgcgaaaatagggcctggtcctatcggcgggctggatctccaggtgcgcatcctgatgaggctgagag- ttggcagggtagcccggctgcgaccaggcagggtgaccgggtc aacacctactgtccaacgtcggtctgttccgaagggtggtgtcaatcaggtgggtggatcagagtgggctacaaggtccttccagctggggtcatcccatt- accgggtcggacactgggagcaggacgacctggaaaagccctgctac aagtcggcagcaatcttttccagcccgcccagcgacatttcattttgctgcgcgatataggcgtcatacagagccatttgctcgttgtatttcgctatctg- tgcatctgttggttcatccggtaactcttcggcgggtttaaccgctttcagtttcttac gaacgaatttcagacatcagcacccaactgaacgcctttcccggctgtgaagttgctgtcagcgacgcgccgagcggtccagttgattgtggtggtggaag- cagaagacagcgaaacgctgatccaaaccattgagtcagtacgcaa gctgcgcgaccgcgaatatgtgaagaccgaaaagaagcggctcgtccccgaggacaaaggccggatcgtcaccgccttcctggagagcttcttccgccgct- acgtggaatacgacttcacggcggatctggaggagcagctcgacc gacctcgtccaggctgaggctgatttcatcgagccaggcgagatagcagttgaggtcgtcggtgtaggtggcgatcgtgcccaccgacgcccccagctccg- tgggccagggcatcgagccagaggtgggctaatcgctgattggtcc cggaccaatcccgctgatcacctcgacctcacgcataggatcggatcaggtgctgatctcgcaaacccttaggacctgtcgtcagagcgaaggggagggga- ctgttattccaccatctctgtgtcgaactcggccagagtgctccgc aaatgaactgatccttctccggcttgccgcgggcctgctgatagtagcggatgaagcgcacgctggaatcgaccgcgtccgatccgcccagggtgaaatag- atgtggttgagatcgcccggcgcccgctcgggctagtgccgaggca cgttcagaaggtcagctatatcggccggcgattctttgcttcgtacctgcgcgacggccgcaccgaagtaaggatgtacgatgaggcgcagagtctgggcgt- cgtacctctgccgggtctcgggcgcgctgtcggttttgaaggacggaa aacttccagcaggcggaacgcctcatccctggcatcgcatttcgctgatatcgttcaaccgttcaacgcgcacgttggtaatttccaacagaatgcgtgatg- cccatcgcggcatgtgaattgatggacgccacccaccatcaaactttcat cttggacttaatgagcaaggagcggaggtaatcgaaatggcaccatttccaatcgaaacgatactggggaaagccggcgccctctctgtcttcctgttcatc- ggagtcgcctttggatgggtgttggagaacgccggattcggcaactcac ctgtctcatgaacaggatatgctgcgtcttcgcatcatgatctggcgcactcttgcgaccgacacctttgacatcgctctgccggttaaccagtcctttgat- gtatgggcaaccatcattcgtggcaattccagactgtatatcgcgacattatt accgttcagaaggtcagctatatcggccggcgattctttgcttcgtacctgcgcgacggccgcaccgaagtaaggatgtacgatgaggccggcaagagtctg- ggcgtcgtacctctgccgggtctcgggcgcgctgtcggttttgaagga tgccaggctgttatcgaactcctgggctcaagtgatccttctgccttggtctcccaaagtgctagggttaaaagtgctggggttataagtgtgagccactgc- ctctagcccagttttttagttgttacaaattgccaagtaaggactaatcca ttgctctgttagctgtgctggtactggttggagccggggtgttcttctacgtcaaggggatgcccggatctcattcggatgccgctcctcaaccaacccaggc- accaatctctacctctacgccagaggtcaggccaacgcgaactgtgacg gcagcgtggctgggtgccggatggtgcgatccacgcgatcgccgatgtgctgggtattccggcaagcgacgtcgaaggtgtggcacgttctacagtcagatct- tccgccagccggttggtcgccatgtgaatccgttattgtgacaagcgt cagaccagcagcgtatgctcctccagggcttttgcgatgggcacaccgcgggacatggcctgctgctcgcaagtttccgcgtctgtccggatcggcgcccgaa- gtgacccgtgaacagcgccgagtccttcaggcccgcctctcgc atttggtgcatttgcctgcccttgctgcctggaaccctgaaaatcccggtgactttggcggtttgggcatgagcagtgacgagtcagccattttctatgcatt- cggtattggcgatggcagctggggagcattttatgatgtttgctgcctgtaccc tcgccctctgctcacgaaagatgctgtccgcccatcggaagaactcactatttcgcggttgtgttggtgggatcccccggagcccgcatcgcgcgtgcgcatg- agctcattcgagaggtgggcgacgagacttgagaggaaagcgctgg gccggtggatgagttacagggaagtgcagagcgactgaagaaacgcctcgagaatatgggtgagatcaaccctaccgcaattgaggcgtacctggaaatgaa- gaaacgttacgaattcatacttgaaacagaaagacggatcttgg tcacgttattatgtagtctgccggacaccttattacaggatgagtatcagcagaagagtgtgaactatcaggcggtgacatctgtgtggactacagtcagcat- actgactgcgctgtgatggctctacgatgctcgcgaaaaacaccccc gccttcagccttcattctcagtagttaatgccatctggatggaaaacagaggaatctactgctgtaccgacacatacgacggaggaggtgaatatcggcttga- aaatggcatcgatgcgcggagacaacagatgcagcaaaggagaa gagctcgtcagcaatttcagtactacggaactgaaacttgtcagcctcatcgggacctattattatacctattctacctgcagccttattgccggaattggcc- tggataagttcggtggcaaaagatcgctttttgcaggtgctttaattctgggaat caatgtagaccgtttatatcaaacggttgggcagttgattaacaatttggtcttcggcggcgatgtgaacgccggtgcgtaggcgacgacgggtgaatcacgag- ttctg gggctgaccaggcgatagcctttggcacttcaggtgggtctaggcggccgggccggtggcgggccatgcccatgatcaggatctgcgcatcgccagcgaccacc- ggttgctcgt gccacatcaatggtgatacctgttcacgttcagccacaaggccgtctgtcagcaatgacaggtctgtaaaatcaagtcctttgcgttg gcgtcctcggccggcatcctggtcacgttgactgccatctccaacggagcaacagacagggtgccgggggggaccgaacttagagtgttctaatgcgagctaga- gccatgct cgtggtcggcccggcggcgaggaaatctacaccgacgaatatggccgggtgcgcgtgcagttccactgggaccgggagggcgcgaacgacgagcgcagggtcag- cctggataccgcgtccgcac ggcgagcatttccattgatacagttgctctggtgagcagggcttttccaggtcgtcctgcgtccaggtgtccgacgggtgatgggatggagccagctgggaagg- actggtgagccactctg aggagcaactgtgatcaatggacatgcttggctgaccggtcaccctggctgggtcgagccggctacctgccaatctcagcctcatcaggagtgcgacctgggag- atcaaggcggccataggacaggcca ggttttacctgcctcggcaaaccgtctgagcattcaggatccccacctttgaagggtcaaggttaaggggcattgcagataatgcgcttgagcttctggtgctg- cgtttttta gtagagggcgtgctggcggtgtcgctgggttatcaccagcaggaagagcaaggtgaggaaacaccatgaaactcagtcgtcgtagctttatgaagctacgccgt- tgcggcgctgcggcgcgtgccggtctc gcatctccaattccgagatcgactgggaagcaggtgcttcgcgatttctggcgcgacttctcggcagccatcggcgagacgaagagctgccgcaccgcggagt cacgaagaccagcgttcgtgcggaccaacaggggccgtactcctgtattctttcagaaggatctggggaagactcgaacttgctgga gctgtgatcagatcctccaaggcttctcaatcgggcgataaggcgatccagccgcggtgtgagaaagatcaggtagcggcttggttctccgacctgtagtgatg- cgccagc ggcggatgcccggctccgcgccgaggccgaaatagccggtcgcataaggcagctcccgcatctggcggctggcggcttccacgagtgctggcatgggccgtagc- cgggcgttgacgcg acgagacggaacgttctacgtattcacaagctacacggttccctccgttgtttaccactacgatttaaagaccacaagagcactcttggaagcaaccgaaggtc- gacgcggatctacgaaatatgagaccagcctcgtcttctacaaca tcacaggtgtgaggtttccaggtcgggcatcatcgggtatcgaccataaggccgtaatcaccagggtttttggtcgggaactgggccgaataaatccttgctgc- ggttcttctcatctgccacgac caagctggcagcacagttttatttcagagagatgaccgttctcaaggtcatgttcacggccatcgtcgtcgccatggtcttgatattcgcgacttcaggtctgg- ggcttctagacta agcgcgttaaatcttctggtgcgatggggatgtttgctggtgctgatgcagcatctttcttcaaacagttgccgaaggatttcttc gggaagacgagacggaaacgttctacgtattcacaaggctacacgggtccctccggtggtttaccactacgagttaaagacccacaggagcactccttggga aaagactggagtattttgtcaatgaacatgtttcaacatatgtatctcttacaaaatgcagctggtttaaatcctaaaggc ctcatgccacggtgacaacgatgagttctcccatacagatccagcttcctggcggggcggtggagtgtggacaaggggccttgatcgcaaatcctcgcaccacc- tgtctct gtctgtcatatcacggtatacaggtaatcggcgcgctcgagaaaagctgactcaccgggcacgacatttgataggcgcttaagctgctgccactgctgctggga- ct gaacatcgccgagcgatacgcccgtccattccgcgcacgcgaccccgccattggtccagggattgcctccgccttcggctccgaagaacgagcggccgt ctacgtacggcaatctttggctttagcagtcatttgcagttggtgcatggccgtgtg cgccgggtgatggaagcacacagtgctcaacgcggacgataccgattggtccatctgtttcgtataggtccatgtgcttctcaactacat atctggaattcgttcggacaaagctttcttcggagcctaggctagcttctagaccacaacgtgtgggggggcccgagctcccggccgcaacaatttcacattgg- gccgtcgtttttacaacgcttgttgtca cataccatatccgagcgagcgtgattataacaacgtgcttccgacaagcgagagcctcgcgctctggatagagatacatcgtgtcagattac atgatgtttgaagactactcttgcctgccagggagagtacatgccgaaagcagaaggcgtacacatcaaaagagatacatggcgataatacggaggatacaaca- ggcgggaacatgctgtgatg aggctgtctgtaatttctttgcatctcgcttattcaggtgtgtgttgcaggaagattgttgcagggagc caaagatggcacgcgcgtaccattgttcatcaccgcgcgcaaggatataagctggacggaacagaatcccctttaccctatgatacggcggatc

gcgtccgcac ctg taggacaggcca

cgcgtgccggtctc

gcg

aaatatgagaccagcctcgtcttctacaacagcaaagatggcacgcgcgtaccattgttcatcaccgcgcgcaaggatataagctggacggaacagaatcc- cctttaccctatgataggcggatc

gac

cttgttgtca

gctgtgatg

REFERENCES

[0114] Burke and Olson, “Preparation of Clone Libraries in Yeast Artificial-Chromosome Vectors” in Methods in Enzymology, Vol. 194, “Guide to Yeast Genetics and Molecular Biology”, eds. C. Guthrie and G. Fink, Academic Press, Inc., Chap. 17, pp. 251-270 (1991).

[0115] Capecchi, “Altering the genome by homologous recombination” Science 244:1288-1292 (1989).

[0116] Davies et al., “Targeted alterations in yeast artificial chromosomes for inter-species gene transfer”, Nucleic Acids Research, Vol. 20, No. 11, pp. 2693-2698 (1992).

[0117] Dickinson et al., “High frequency gene targeting using insertional vectors”, Human Molecular Genetics, Vol. 2, No. 8, pp. 1299-1302 (1993).

[0118] Duff and Lincoln, “Insertion of a pathogenic mutation into a yeast artificial chromosome containing the human APP gene and expression in ES cells”, Research Advances in Alzheimers Disease and Related Disorders, 1995.

[0119] Huxley et al., “The human HPRT gene on a yeast artificial chromosome is functional when transferred to mouse cells by cell fusion”, Genomics, 9:742-750 (1991).

[0120] Jakobovits et al., “Germ-line transmission and expression of a human-derived yeast artificial chromosome”, Nature, Vol. 362, pp. 255-261 (1993).

[0121] Lamb et al., “Introduction and expression of the 400 kilobase precursor amyloid protein gene in transgenic mice”, Nature Genetics, Vol. 5, pp. 22-29 (1993).

[0122] Pearson and Choi, Expression of the human b-amyloid precursor protein gene from a yeast artificial chromosome in transgenic mice. Proc. Natl. Acad. Sci. USA, 1993. 90:10578-82.

[0123] Rothstein, “Targeting, disruption, replacement, and allele rescue: integrative DNA transformation in yeast” in Methods in Enzymology, Vol. 194, “Guide to Yeast Genetics and Molecular Biology”, eds. C. Guthrie and G. Fink, Academic Press, Inc., Chap. 19, pp. 281-301 (1991).

[0124] Schedl et al., “A yeast artificial chromosome covering the tyrosinase gene confers copy number-dependent expression in transgenic mice”, Nature, Vol. 362, pp. 258-261 (1993).

[0125] Strauss et al., “Germ line transmission of a yeast artificial chromosome spanning the murine a₁ (I) collagen locus”, Science, Vol. 259, pp. 1904-1907 (1993).

[0126] Gilboa, E, Eglitis, M A, Kantoff, P W, Anderson, W F: Transfer and expression of cloned genes using retroviral vectors. BioTechniques 4(6):504-512, 1986.

[0127] Cregg J M, Vedvick T S, Raschke W C: Recent Advances in the Expression of Foreign Genes in Pichia pastoris, Bio/Technology 11:905-910, 1993

[0128] Culver, 1998. Site-Directed recombination for repair of mutations in the human ADA gene. (Abstract) Antisense DNA & RNA based therapeutics, February, 1998, Coronado, Calif.

[0129] Huston et al, 1991 “Protein engineering of single-chain Fv analogs and fusion proteins” in Methods in Enzymology (J J Langone, ed.; Academic Press, New York, N.Y.) 203:46-88.

[0130] Johnson and Bird, 1991 “Construction of single-chain Fvb derivatives of monoclonal antibodies and their production in Escherichia coli in Methods in Enzymology (J J Langone, ed.; Academic Press, New York, N.Y.) 203:88-99.

[0131] Mernaugh and Mernaugh, 1995 “An overview of phage-displayed recombinant antibodies” in Molecular Methods In Plant Pathology (R P Singh and U S Singh, eds.; CRC Press Inc., Boca Raton, Fla.) pp. 359-365.

1 141 1 29 DNA artificial sequence misc_feature (1)..(29) primer 1 gggtccggca gaccgttcgc gggccggac 29 2 30 DNA artificial sequence misc_feature (1)..(30) primer 2 gagcggaccg caccgcgatc ggaacaacct 30 3 28 DNA artificial sequence misc_feature (1)..(28) primer 3 tctccggggc agcgcggtcg cggaacgt 28 4 8 DNA artificial sequence misc_feature (1)..(8) Octamer OC-OCT-003 4 ctcgccga 8 5 8 DNA artificial sequence misc_feature (1)..(8) octamer OS-OCT-001 5 gtcggcga 8 6 8 DNA artificial sequence misc_feature (1)..(8) octamer OS-OCT-002 6 ccagatcg 8 7 8 DNA artificial sequence misc_feature (1)..(8) octamer OC-OCT004 7 cgacatcg 8 8 8 DNA artificial sequence misc_feature (1)..(8) octamer OS-OCT-005 8 gccgatca 8 9 8 DNA artificial sequence misc_feature (1)..(8) octamer OS-OCT-006 9 gccaccga 8 10 8 DNA artificial seqeunce misc_feature (1)..(8) octamer OS-OCT-007 10 gatgccga 8 11 8 DNA artificial sequence misc_feature (1)..(8) octmer OS-OCT-008 11 cggcgaag 8 12 8 DNA artificial sequence misc_feature (1)..(8) octamer OS-OCT-009 12 cggcgaac 8 13 8 DNA artificial sequence misc_feature (1)..(8) octamer OS-OCT-010 13 ggcgatca 8 14 8 DNA artificial sequence misc_feature (1)..(8) octamer OS-OCT-011 14 gccgagga 8 15 8 DNA artificial sequence misc_feature (1)..(8) octamer OS-OCT-012 15 cgccgaca 8 16 8 DNA artificial sequence misc_feature (1)..(8) octamer OS-OCT-013 16 atcgccga 8 17 8 DNA artificial sequence misc_feature (1)..(8) octamer OS-OCT-014 17 ggcgaacc 8 18 8 DNA artificial sequence misc_feature (1)..(8) octamer OS-OCT-015 18 gccgacca 8 19 8 DNA artificial sequence misc_feature (1)..(8) octamer OS-OCT-016 19 gccaagga 8 20 8 DNA artificial sequence misc_feature (1)..(8) octamer OS-OCT-017 20 cggcaacg 8 21 8 DNA artificial sequence misc_feature (1)..(8) octamer OS-OCT-018 21 ggctggac 8 22 8 DNA artificial sequence misc_feature (1)..(8) octamer OS-OCT-019 22 gcagcacc 8 23 8 DNA artificial sequence artificial sequence (1)..(8) octamer OS-OCT-020 23 ccagccag 8 24 8 DNA artificial sequence misc_feature (1)..(8) octamer OS-OCT-21 24 cgccgccg 8 25 8 DNA artificial sequence misc_feature (1)..(8) octamer OS-OCT-22 25 cggcgacc 8 26 8 DNA artificial sequence misc_feature (1)..(8) octamer OS-OCT-23 26 ccgccgcc 8 27 8 DNA artificial sequence misc_feature (1)..(8) octamer OS-OCT-24 27 cgcggccg 8 28 8 DNA artificial sequence misc_feature (1)..(8) octamer OS-OCT-25 28 gtcggcga 8 29 10 DNA artificial sequence misc_feature (1)..(10) decamer OS-DEC-001 29 cagctcggcg 10 30 10 DNA artificial sequence misc_feature (1)..(10) decamer OS-DEC-002 30 gccggtgagc 10 31 10 DNA artificial sequence misc_feature (1)..(10) decamer OS-DEC-003 31 ccgggtcgag 10 32 10 DNA artificial sequence misc_feature (1)..(10) decamer OS-DEC-004 32 ggcgccgccc 10 33 10 DNA artificial sequence misc_feature (1)..(10) decamer OS-DEC-005 33 ggcgccgccc 10 34 10 DNA artificial sequence misc_feature (1)..(10) decamer OS-DEC-006 34 cgaggtcgag 10 35 10 DNA artificial sequence misc_feature (1)..(10) decamer OS-DEC-007 35 cgagcaggcc 10 36 10 DNA artificial sequence misc_feature (1)..(10) decamer OS-DEC-008 36 cgacgcgggc 10 37 10 DNA artificial sequence misc_feature (1)..(10) decamer OC-DEC-009 37 cctggccgcg 10 38 10 DNA artificial sequence misc_feature (1)..(10) decamer OS-DEC-010 38 cctgcgcggc 10 39 10 DNA artificial sequence misc_feature (1)..(10) decamer OS-DEC-011 39 acggccgcgg 10 40 10 DNA artificial sequence misc_feature (1)..(10) decamer OS-DEC-012 40 cgaggacgtc 10 41 20 PRT artificial sequence UNSURE (1)..(20) primer 41 Cys Gly Ala Thr Gly Cys Thr Ser Thr Ala Tyr Gly Ala Arg Thr Gly 1 5 10 15 Gly Cys Thr Ala 20 42 20 PRT artificial sequence UNSURE (1)..(20) primer 42 Thr Gly Gly Cys Gly Tyr Gly Thr Tyr Thr Gly Val Ala Cys Cys Ala 1 5 10 15 Thr Gly Thr Ala 20 43 20 PRT artificial sequence UNSURE (1)..(20) primer 43 Cys Cys Gly Ala Cys Arg Cys Thr Tyr Gly Cys Lys Gly Ala Cys Gly 1 5 10 15 Thr Ala Cys Ala 20 44 20 PRT artificial sequence UNSURE (1)..(20) primer 44 Cys Ala Thr Cys Asx Gly Gly Ser Gly Thr Ser Gly Thr Thr Ala Cys 1 5 10 15 Gly Gly Thr Ala 20 45 21 PRT artificial sequence unsure (1)..(20) primer 45 Thr Gly Cys Thr Ser Gly Thr Ser Gly Gly Ser Gly Ala Arg Gly Ala 1 5 10 15 Gly Cys Thr Gly Ala 20 46 22 PRT artificial sequence UNSURE (1)..(22) primer 46 Thr Cys Ser Ala Cys Tyr Thr Thr Gly Cys Cys Arg Thr Thr Gly Ala 1 5 10 15 Cys Arg Thr Thr Gly Ala 20 47 22 PRT artificial sequence UNSURE (1)..(22) primer 47 Thr Thr Cys Ala Cys Cys Gly Ala Arg Arg Cys Ser Gly Cys Gly Thr 1 5 10 15 Thr Cys Gly Thr Cys Ala 20 48 23 PRT artificial sequence UNSURE (1)..(23) primer 48 Cys Gly Cys Cys Gly Gly Ser Ala Cys Cys Ala Thr Ser Ala Tyr Cys 1 5 10 15 Cys Gly Gly Ala Thr Ser Ala 20 49 19 PRT artificial sequence UNSURE (1)..(19) primer 49 Gly Ala Thr Cys Gly Cys Gly Thr Gly Cys Gly Cys Ala Ala Gly Ala 1 5 10 15 Ala Ala Thr 50 22 PRT artificial sequence UNSURE (1)..(22) primer 50 Cys Cys Ala Cys Ser Thr Thr Thr Gly Gly Tyr His Thr Arg Gly Gly 1 5 10 15 Arg Gly Ala Thr Cys Gly 20 51 24 PRT artificial sequence UNSURE (1)..(24) primer 51 Ala Tyr Arg Cys Gly Thr Thr Cys Ala Ala Gly Tyr Gly Cys Met Gly 1 5 10 15 Cys Met Met Met Ala Cys Ala Gly 20 52 20 PRT artificial sequence UNSURE (1)..(20) primer 52 Gly Ala Thr Ala Ala Ala Thr Tyr Thr Gly Tyr Ala Cys Thr Gly Ala 1 5 10 15 Arg Cys Cys Lys 20 53 23 PRT artificial sequence UNSURE (1)..(23) primer 53 Thr Thr Cys Lys Ser Cys Gly Ala Gly Gly Ala Cys Cys Ala Cys Cys 1 5 10 15 Cys Gly Met Trp Gly Ala Thr 20 54 21 PRT artificial sequence UNSURE (1)..(21) primer 54 Gly Gly Ala Ala Gly Thr Ala Gly Thr Cys Gly Thr Lys Ser Gly Thr 1 5 10 15 Gly Ala Lys Gly Thr 20 55 20 PRT artificial sequence UNSURE (1)..(20) primer 55 Gly Ala Thr Gly Cys Ala Cys Gly Ala Gly Gly Thr Asx Ala Ala Cys 1 5 10 15 Val Thr Cys Thr 20 56 21 PRT artificial sequence UNSURE (1)..(21) primer 56 Thr Cys Thr Gly Gly Trp Ala Ser Ala Gly Ser Ala Cys Gly Gly Thr 1 5 10 15 Gly Ala Thr Cys Ala 20 57 22 PRT artificial sequence UNSURE (1)..(22) primer 57 Ala Thr Gly Cys Ser Arg Cys Gly Tyr Ala Ala Arg Cys Gly Gly Cys 1 5 10 15 Ala Arg Thr Thr Gly Thr 20 58 22 PRT artificial sequence UNSURE (1)..(22) primer 58 Cys Gly Ala Thr Arg Gly Cys Tyr Thr Cys Tyr Thr Thr Gly Cys Cys 1 5 10 15 Tyr Gly Thr Cys Ala Thr 20 59 22 PRT artificial sequence UNSURE (1)..(22) primer 59 Cys Gly Thr Thr Gly Gly Cys Tyr Cys Thr Thr Cys Ala Thr Gly Gly 1 5 10 15 Ser Gly Ala Thr Gly Ala 20 60 23 PRT artificial sequence UNSURE (1)..(23) primer 60 Cys Cys Thr Thr Thr Gly Gly Cys Thr Ala Thr Tyr Thr Cys Arg Gly 1 5 10 15 Cys Tyr Thr Cys Cys Ala Thr 20 61 20 PRT artificial sequence UNSURE (1)..(20) primer 61 Gly Ala Gly Thr Thr Cys Gly Ala Cys Gly Cys Ser Gly Val Ser Thr 1 5 10 15 Thr Cys Thr Thr 20 62 21 PRT artificial sequence UNSURE (1)..(21) primer 62 Gly Gly Thr Gly Thr Gly Asn Cys Cys Gly Ala Thr Gly Thr Thr Gly 1 5 10 15 Gly Ala Cys Thr Thr 20 63 20 PRT artificial sequence UNSURE (1)..(20) primer 63 Gly Cys Ala Arg Cys Gly Val Cys Thr Cys Cys Thr Gly Cys Thr Ser 1 5 10 15 Gly Ala Ala Ala 20 64 20 PRT artificial sequence UNSURE (1)..(20) primer 64 Thr Thr Gly Cys Thr Ser Gly Cys Arg Cys Cys Gly Thr Cys Cys Thr 1 5 10 15 Gly Gly Thr Thr 20 65 20 PRT artificial sequence UNSURE (1)..(20) primer 65 Ala Gly Cys Gly Thr Ser Thr Thr Cys Gly Cys Ser Gly Thr Gly Gly 1 5 10 15 Ala Gly Thr Ala 20 66 20 PRT artificial sequence UNSURE (1)..(20) primer 66 Gly Thr Gly Cys Thr Thr Cys Thr Cys Ser Ala Gly Met Ala Gly Cys 1 5 10 15 Thr Thr Gly Ala 20 67 20 PRT artificial sequence UNSURE (1)..(20) primer 67 Thr Ser Ala Ala Gly Gly Ala Gly Ala Cys Cys Gly Ala Arg Gly Ala 1 5 10 15 Ser Gly Ala Ala 20 68 19 PRT artificial sequence UNSURE (1)..(19) primer 68 Cys Gly Ser Ala Cys Cys Ala Cys Ser Ala Gly Cys Ala Gly Gly Thr 1 5 10 15 Thr Cys Ala 69 20 PRT artificial sequence UNSURE (1)..(20) primer 69 Cys Thr Thr Cys Thr Ala Tyr Gly Ser Val Cys Thr Gly Gly Ala Gly 1 5 10 15 Gly Cys Cys Ala 20 70 20 PRT artificial sequence UNSURE (1)..(20) primer 70 Gly Ala Gly Gly Arg Gly Ala Thr Cys Thr Gly Ser Ser Cys Cys Asp 1 5 10 15 Thr Gly Cys Thr 20 71 20 PRT artificial sequence UNSURE (1)..(20) primer 71 Gly Thr Gly Gly Gly Met Gly Ala Cys Gly Gly Tyr Thr Cys Ser Ala 1 5 10 15 Ala Arg Thr Thr 20 72 20 PRT artificial sequence UNSURE (1)..(20) primer 72 Thr Tyr Gly Cys Gly Cys Ala Gly Cys Ala Cys Gly Ala Thr Gly Gly 1 5 10 15 Ala Gly Ala Ala 20 73 20 PRT artificial sequence UNSURE (1)..(20) primer 73 Cys Ala Cys Thr Ala Cys Gly Thr Cys Cys Arg Ser Ala Cys Cys Cys 1 5 10 15 Thr Cys Cys Thr 20 74 19 PRT artificial sequence UNSURE (1)..(19) primer 74 Gly Cys Cys Ala Lys Asx Cys Cys Gly Thr Cys Gly Ser Tyr Cys Ala 1 5 10 15 Gly Gly Thr 75 19 PRT artificial sequence UNSURE (1)..(19) primer 75 Cys Ser Gly Tyr Ser Thr Thr Cys Gly Thr Gly Cys Gly Gly Ala Cys 1 5 10 15 Cys Ala Ala 76 19 PRT artificial sequence UNSURE (1)..(19) primer 76 Cys Gly Gly Thr Cys Ser Lys Ser Gly Ala Cys Gly Thr Arg Cys Thr 1 5 10 15 Cys Cys Ala 77 20 PRT artificial sequence UNSURE (1)..(20) primer 77 Ala Thr Cys Gly Thr Ser Cys Cys Tyr Gly Gly Arg Thr Ala Ser Ala 1 5 10 15 Thr Gly Thr Thr 20 78 19 PRT artficial sequence UNSURE (1)..(19) primer 78 Gly Ala Thr Gly Gly Ala Arg Cys Gly Ser Cys Cys Ser Ala Gly Ser 1 5 10 15 Ala Cys Ala 79 23 PRT artificial sequence UNSURE (1)..(23) primer 79 Cys Thr Thr Ser Thr Cys Cys Cys Thr Ser Ala Ala Tyr Cys Ala Ser 1 5 10 15 Thr Ala Tyr Ala Ala Gly Ala 20 80 20 PRT artificial sequence UNSURE (1)..(20) primer 80 Gly Cys Gly Thr Thr Cys Cys Arg Gly Gly Tyr Thr Cys Arg Ala Ala 1 5 10 15 Gly Gly Ala Ala 20 81 21 PRT artificial sequence UNSURE (1)..(21) primer 81 Gly Cys Met Thr Ser Cys Cys Ser Ser Thr Ser Ala Thr Cys Gly Ala 1 5 10 15 Gly Gly Ala Cys Thr 20 82 21 PRT artificial sequence UNSURE (1)..(21) primer 82 Gly Thr Cys Gly Ala Ala Cys Cys Gly Gly Gly Ser Met Ser Gly Gly 1 5 10 15 Gly Thr Cys Cys Ala 20 83 19 PRT artificial sequence UNSURE (1)..(19) primer 83 Cys Gly Gly Cys Gly Cys Arg Tyr Ser Gly Gly Arg Gly Thr Ser Thr 1 5 10 15 Thr Cys Ala 84 20 PRT artificial sequence UNSURE (1)..(20) primer 84 Gly Thr Trp Cys Ala Gly Cys Gly Gly Gly Thr Thr Gly Lys Cys Gly 1 5 10 15 Thr Thr Cys Ala 20 85 21 PRT artificial sequence UNSURE (1)..(21) primer 85 Ala Thr Gly Ala Thr Gly Thr Gly Gly Gly Thr Tyr Thr Gly Ser Thr 1 5 10 15 Cys Ser Ala Gly Ala 20 86 21 PRT artificial sequence UNSURE (1)..(21) primer 86 Thr Thr Thr Tyr Thr Cys Arg Thr Cys Ser Cys Cys Thr Gly Thr Arg 1 5 10 15 Thr Thr Cys Ala Thr 20 87 21 PRT artificial sequence UNSURE (1)..(21) primer 87 Thr Cys Cys Tyr Gly Gly Tyr Cys Cys Lys Gly Thr Ser Gly Thr Met 1 5 10 15 Ala Thr Gly Ala Thr 20 88 21 PRT artificial sequence UNSURE (1)..(21) primer 88 Ala Cys Cys Tyr Thr Cys Ser Ala Ala Ser Gly Cys Arg Thr Thr Tyr 1 5 10 15 Ala Ala Cys Ala Thr 20 89 290 DNA artificial sequence misc_feature (1)..(290) probe 89 ccgacactcg cggacgtaca gacatgaacc tggcgcttgg gttagaagaa gataccggtc 60 tagagcactt gaagtaggag gaggccggaa ggggatgggg aaaacgctct cacaggtgga 120 caagggaacg aggcagggtc ttatcttaag gctgatctcc gagaagcatt ggggaccagc 180 aggatctggt cgcggccctg tcggaggctg gggttgaggt ggcccaggcg accgtgagtc 240 gggaccttgc gagctcgggg tcctaaaggt cggtaaccgc tatctccggc 290 90 267 DNA artificial sequence misc_feature (1)..(267) probe 90 ccgacgcttg ctgacgtaca ggcggaatgc agaacttcca cgggggcatt aaaacgttca 60 cgaaaacggg cgatagtttg cggtgtcagg ccgatttccg gcaccatcac cagcgcctgt 120 ttgccctgag cgagcacgtt ttccagtacg ctgagataaa cctccgtttt acgtaaccac 180 gcccgatgat cacgaattct ggatccgata cgtaacgcgt ctgcagcatg cgtggtaccg 240 agctttccct atagtgagtc gtataga 267 91 274 DNA artificial sequence misc_feature (1)..(274) probe 91 catctggggt ggttacggta caggcaactt cacgccatct gccgagttca acatctcttg 60 ccgacccaga agccgcacgc gtagtgttca cctccggcgt tccattagtg atgatgggcc 120 tcgatctcac aaccagaccg tttgcacccg gacgtgattg ctcggatgga aaggcaggcg 180 gcccgccgga gagctgttca gcgacatcat gaacttcact ctcaaaacgc agtcgaaaac 240 tacggccttg ctggcggccg gtgcacgacg ccac 274 92 293 DNA artificial sequence misc_feature (1)..(293) probe 92 ccgacgctcg cggacgtaca tgacaaacct ttatttcaag attaaagaag ataagcgcaa 60 ggctgcgaga ggtgaataat gcctccatca cttacgcaaa agccgcttgc tgctgctcat 120 tggtggcgcg acgcaattgc tcatagcact cacgtgttaa tcactcggcc caatggtaac 180 cgatggtccc tggaagatgt ccagccctac cataccatca ccaaagatat tgttggtgta 240 tggcactgta tgctcaccgg acacaccgga aaagaccatc attgctccgg taa 293 93 95 DNA artificial sequence misc_feature (1)..(95) probe 93 gaggcgatat cattttctac aggaatacgc accaaagact caatcagatt gcgtccaaca 60 agaccggcat tgcatcgggg gtggttacgg tattc 95 94 105 DNA artificial sequence misc_feature (1)..(105) probe 94 catcgggggt ggttacggta taaactgcgg cttcttcttt ttcttctttc ttcttgttac 60 acgctgtaaa caacagaaga ctgcttagcg caatacttgc gacaa 105 95 270 DNA artificial sequence misc_feature (1)..(270) probe 95 ccgacgctcg cggacgtaca tcttcaagcg tccatcgctc ggcatggtga tgtggatctg 60 gatcagcgtg atgaacccgc atacgcaagg gtggggcttc gcgcgcgaag cgttcgccgc 120 catcatcgcg gtgacgacgg tcgccgccat ggccacgaac gcgtaccgga ataccgtaac 180 cacccccgat gatcacgaat tctggatccg atacgtaacg cgtctgcagc atgcgtggta 240 cgagctttcc tatagtgagt cgtatagagg 270 96 126 DNA artificial sequence misc_feature (1)..(126) probe 96 gatcgcgtgc gcaagaaatc tgcgccgcct ggcagggtcg agttgtcggc tggtactgca 60 cagatctgac ccctgaaggc tatgccgtcg agtccgagtc tcaccccggc tcagtacaga 120 tttatc 126 97 127 DNA artificial sequence misc_feature (1)..(127) probe 97 gataaatatg cactgagccg gggtgagact cggactcgac ggcatagcct tcaggggtca 60 gatttgtgca gtaccagccg acaactcgac cctgccaggc ggcgccagat ttcttgcgca 120 cgcgatc 127 98 127 DNA artificial sequence misc_feature (1)..(127) probe 98 gataaatctg tactgagcct ggatgcgact cggactcgac ggcatagcct tcaggggtca 60 gttttgtgca gtaccagccg acaactcgac cctgccaggc ggcgccagat ttcttgcgca 120 cgcgatc 127 99 275 DNA artificial sequence misc_feature (1)..(275) probe 99 cctttggcta tttcggcttc catctcgacg acagtgaagt tctccctgta gaagaagtcg 60 aggtacccct cgttctgaag aaatgtccct ttgaccgtgg accgcctttt ggttatcgag 120 cgcggcgcca taatccgagg gtatgggggc gaggtcggca taggctggaa cgcatttcgg 180 aaccaggtag gtgggttccc gggaggtggc ctcggtcata ggacaacgtc cgaggatcat 240 tcacgtcgcc caatgggcgg cccggttggg gccgt 275 100 286 DNA artificial sequence misc_feature (1)..(286) probe 100 cctttggcta tttcggcttc catcgtggcc tccttgcctg tcatccactt cttcaacaga 60 gatatttgag aaatcagaaa tttctgtctt taaaggagat gtctggctgc gggaaccgat 120 catctgtagc tgtgttctta taatattctg aatttttgca cgcttgtttc ttctgctttt 180 ttttctaaag ataccagaat agcaaccaaa ggcagcaagc agtacaacaa ctgccgtttg 240 gcgccgcata tctgaattcg tcgacaagct tcttgagcct aggcta 286 101 272 DNA artificial sequence misc_feature (1)..(272) probe 101 cctttggcta tttcggcttc catgccgtca tgacaccctc ctggtgttcg tacaattttt 60 cttttatcac ctttgcgccc tgttcttctt ctacaccgtc aacggactta ctaccatcgg 120 taaatggccg cggcgtatca tattcgccct cttatttctc aaccctgcca tcctttatct 180 caggtaacta tatcaccagg tgacgtctat ttcatcgcca tgaagggccc acgatctgaa 240 ttcgtcgaca aggcttctcg agcctagggc ta 272 102 101 DNA artificial sequence misc_feature (1)..(101) probe 102 cgatggcctc tttgcctgtc atttttcgat cactaccacc gggcgtgcca gtcgtattgc 60 cagcgcctgt gccgtctcgc cttgtgtcta atcaataaaa c 101 103 262 DNA artificial sequence misc_feature (1)..(262) probe 103 cgttggccct tcatggcgat gatgtcagca ccacgcctgt cgcggcgctc aaaagatagc 60 tgtggccgag catgacggga aacatgctgc gatcctgtgc gacacggcgg atcagcgatt 120 cctgcgaacc gataccgcag atctggacgc caagattggt gacggccgta tggaggcgta 180 aagcgcgaat tgttcgacac cgagatgacg ggcaaggagg ccatcatcgc catgaagagc 240 cacgatcacg aattctggat cg 262 104 287 DNA artificial sequence misc_feature (1)..(287) probe 104 cgttggctct tcatggggat gatggtcagc accacgcctg tcgcggcgct caaaagatag 60 ctgtggccga gcatgacggg aaacatgctg cgatcctgtg cgacacggcg gatcagcgat 120 tcctgcgaac cgataccgca gatctggacg ccaagattgg tgacggccgt attgaggccg 180 aaatagccaa aggatctgaa ttcgtcgaca aggcttctcg aggcctaggc tagggctcta 240 ggaccacacg tggtgggggg cccagctcgc ggcgcacaat tcactgc 287 105 290 DNA artificial sequence misc_feature (1)..(290) probe 105 cgttggctct tcatggggat gatctgccgg cctgaggggc tgcgcgcacg gaggaaagat 60 aaggctcgta ggtcatggcc gcgtcgttct ggccggcgat gaaggcctga gcggcaggac 120 ccggctccat gttgacgacg gtcacgtcct tcacggagag accgttcttc ttcagcatcc 180 aggagagggc gaaatagggc gacgtgccgg gcgcggaggc cgccacctgc tggcccttga 240 tgtccttgat ggaggccgag atagccaaag gatctgaatt cgtcgacaag 290 106 285 DNA artificial sequence misc_feature (1)..(285) probe 106 ttgctggcgc cgtcctggtt ggttgcacat tgccattacc cattacgatg gtaatcatca 60 ccgcgatagc gcaaattgca ccgcctcctg cggctgtttt tcccttcata aagacctcat 120 aagcgaattt ttacgctcca ggacaaacac ccattcacag ccaataccga ctgactcatc 180 cctttagaag acacaggata atgcaaatca cttgttagct acgtttcaag atatacatta 240 ttgctctaat taattatttt tattagggat agataggtgg accat 285 107 271 DNA artificial sequence misc_feature (1)..(271) probe 107 ttgctcgcgc cgtcctggtt tgcgagacgc aggaggcaag tcgcagtacc agtcgtagaa 60 gcttaagcaa gtaccgccaa tcagcgagag atagcgtgca cccgatgcgt aagaaaccat 120 cgacattgcc ggaattggcg agaaccagca acacggtccg ggccgctagt ttttgatggt 180 gtaacgttag atgcggcgat cagttcgttc acctcctgcc aggagaacga acaatccacc 240 gccgtcacgc gcctgcttaa ggctttgcgc t 271 108 269 DNA artificial sequence misc_feature (1)..(269) probe 108 aaggagaccg aagaggaaca tgggctggaa gaagatggaa aagccaaaag gaacctttta 60 cgtatgtggc gtgtagaatt ccgagaaacg tttgagaaca tctcaccaat tctccgatta 120 cttgctggag catgctcatg tcgttgtcac accgggcgaa aatattcgga agcacggaaa 180 aaggcatgtc agaatatcga tggtgtcgaa gcaggaggat ctgcgggaat ttgtcatgcg 240 gattcaaagc tgaacctgct cgtggtgcg 269 109 281 DNA artificial sequence misc_feature (1)..(281) probe 109 tgaaggagac cgcaggagga acacggaaat aagaactgat gtgctcgcag gaaataaaga 60 cacagggaaa tatgatcata gatactcaaa cattccttaa ctatagggag cagagcgagg 120 cattaaaggc ctggcagaaa tcaaatccta aggaaggtga atcattacca actatttcaa 180 caatatcaga attgaataag aaaaaatata ttttgagaaa ttgccacaaa aagctgtcta 240 ttttggacag cttttataaa ctactgaact gctagtggtg c 281 110 457 DNA artificial sequence misc_feature (1)..(457) probe 110 ttgcgcacga cgatggagaa ctgatttcgc tgtcctcgat ccggcagtcc tcggagacgg 60 acgtgaacgg accgatgtag gcgtcgttga ccaccgtgcc ggcaccgatg atggcaggcc 120 cgacgatgcg gctgcacact gacgctggcg ccgcctcgac ccggacccgg ccgatgatct 180 cgctgctctc gtcgaccgtg ccctcgacca ccggctccga cggtcctcca aggaccgacc 240 ggtggacctc caagcatgtc gggtaacgtt gccggtgtcc ttccaggtag ccgagagaaa 300 ggtccgttgg aggcgaacgt acacggctgg ctgggtcaat acagcacact gtgaatggcg 360 tggtgggtga aaattctatc aggctcggcc ggcgcacaga gaccggctca tatatagacg 420 caggacggcg ctcttggtga attgccggtg ataaaaa 457 111 302 DNA artificial sequence misc_feature (1)..(302) probe 111 caaagcggaa gcgatgcgga tgttgcgggc aaaagtcgcg gccttagcgg cgcaacaggc 60 tgctgatgaa aatgatcgaa tggcgtcgag gaccagggtg gcgtactggt cggataggat 120 cgggctcgag gtggcgtacc ctataacttt cccgaggaat cgcctggacg ggatcatcgt 180 aattgggtac aagttccagg aacttgacca gagttctggc tggcggacct aggtggatgg 240 tctaggacgc ggctccatgc cgataggtgg agggcgtgga tggcacaacg gccgaaggtc 300 ag 302 112 268 DNA artificial sequence misc_feature (1)..(268) probe 112 tgcgcacgac gatggagaaa gccggctata tggacgaaga tttcttccta tatgccgaag 60 aagtggagtg gtgcagccgt ttacgtaagc tgggcgaatt agcgatcttt ggagacatca 120 acattattca ccttcagggt gagaccaccg gagacgcctt tgactcagcc gataaggcta 180 ctacggcctg tatgaccgta aaggcctcca gctcatgtta tccaatcatg tcagggtcag 240 aaacaattcg gggcacgctg gtacttat 268 113 276 DNA artificial sequence misc_feature (1)..(276) probe 113 tcgcgcacga cgatggagaa ttgggtatcg tgaaggtata attgaggagg agtataaagt 60 accagcgtgc cccgaattgt ttcctgaccc tgacatgatt ggataacatg agctggaggc 120 ctttacggtc atacagggcc gtagtgagcc ttatcggcgt gagtcaaagg gcgtctccgg 180 gtgggtgctc acccgggggg tgagtgatgg tggatgtgcg ccaaagagtt cgggctaatt 240 gggggcagcg ttacggtgga acgggctgcg aggcac 276 114 281 DNA artificial sequence misc_feature (1)..(281) probe 114 cgtcgtttga gcggacatgc gctaaaggca gtgaaattat ccgcctggtt gaagaaagcg 60 atccggtagc ggaactggca ttgcgtcgct acgagctgcg gctggcaaaa tcgctggcac 120 atgtcgtgaa tattctcgat ccggatgtga ttgtcctggg gggcgggatg agcaatgtag 180 accgtttata tcaaacggtt gggcagttga ttaacaattt ggtcttcggc ggcgatgtga 240 acgccggtgc gtaggcgacg acgggtgaat cacgagttct g 281 115 286 DNA artificial sequence misc_feature (1)..(286) probe 115 gcttgcgggt ggagatgatc ttggcctcgg ggatcgttca tcgccaggat caccggcctt 60 ggacgtcggt tcatttccag gctctggcca ggaacatctg ggtcttcggc gtcggcgaac 120 aggatgcggc ggcctcggcg gtattgcgct cgacatcacc gggtcggagt cggggctgac 180 caggcgatag cctttggcac ttcaggtggg tctaggcggc cgggccggtg gcgggccatg 240 cccatgatca ggatctgcgc atcgccagcg accaccggtt gctcgt 286 116 262 DNA artificial sequence misc_feature (1)..(262) probe 116 cgcagcacga tggagaagtg tggaacgtct gctctacaat gcctttacgg gcatcgatca 60 ggatcaggga aacctgcgca ttggaagcac cggttaccat gttacgggta tattcgatgt 120 gacctggtgt atctgcaata atgtatttac ggctgggggt ggaaagtaga tatggccaca 180 tcaatggtga tacctgttca cgttcagcca caaggccgtc tgtcagcaat gacaggtctg 240 taaaatcaag tcctttgcgt tg 262 117 279 DNA artificial sequence misc_feature (1)..(279) probe 117 cgcgcacgac gatggagaat cgatcgggtc cgccttcaac gatctgttga ttggcagcac 60 agtctcgaac cggctcgaag gtgggaacgg caatgacacc ttccgcggca cgcggagcag 120 acgtattgat cggtggtgac ggcacgggcg acacggcaga ctattcagcg tcctcggccg 180 gcatcctggt cacgttgact gccatctcca acggagcaac agacagggtg ccgggggggg 240 accgaactta gagtgttcta atgcgagcta gagccatgt 279 118 288 DNA artificial sequence misc_feature (1)..(288) probe 118 tcggtcggga cgtgctccac acgcgagcaa caactaccag gacaagcacc aggccctgtc 60 ccgctatgcg aacgtgatga cgtgcagccg caccaaggtg ccctggcgcc cgggccgcgg 120 ctacaacagc agcgaaccga agatctacgg cttgcagacc gccacgtggt cggcccggcg 180 gcgaggaaat ctacaccgac gaatatggcc gggtgcgcgt gcagttccac tgggaccggg 240 agggcgcgaa cgacgagcgc agggtcagcc tggataccgc gtccgcac 288 119 289 DNA artificial sequence misc_feature (1)..(289) probe 119 ccgcgttcgt gcggaccaac tggcgatata gaacgggcca gggcaggccg ttggctgcga 60 aaatagggcc tggtcctatc ggcgggctgg atctccaggg tgcgcatcct gatgaggctg 120 agagttggca gggtagcccg gctgcgacca ggcagggtga ccgggtcggc gagcatttcc 180 attgatacag ttgctctggt gagcagggct tttccagggt cgtcctgcgt ccaggtgtcc 240 gacgggtgat gggatggagc cagctgggaa ggactggtga gccactctg 289 120 298 DNA artificial sequence misc_feature (1)..(298) probe 120 catggataac gcctggcagg gaacacctac tgtccaacgt cggtctgttc cgaagggtgg 60 tgtcaatcag gtgggtggat cagagtgggc tacaaggtcc ttccagctgg ggtcatccca 120 ttaccgggtc ggacactggg agcaggacga cctggaaaag ccctgctaca ggagcaactg 180 tgatcaatgg acatgcttgg ctgaccggtc accctggctg ggtcgagccg gctacctgcc 240 aatctcagcc tcatcaggag tgcgacctgg gagatcaagg cggccatagg acaggcca 298 121 296 DNA artificial sequence misc_feature (1)..(296) probe 121 cactacgtcc gcgaccctcc tgaagtcggc agcaatcttt tccagcccgc ccagcgacat 60 ttcattttgc tgcgcgatat aggcgtcata cagagccatt tgctcgttgt atttcgctat 120 ctgtgcatct gttggttcat ccggtaactc tttcggcggg tttaaccgct ttcagtttct 180 tacggtttta cctgcctcgg caaaccgtct gagcattcag gatccccacc tttgaagggt 240 caaggttaag gggcattgca gataatgcgc ttgagcttct ggtgctgcgt ttttta 296 122 300 DNA artificial sequence misc_feature (1)..(300) probe 122 cggtgttcgt gcggaccaaa aggaacgaat ttcagacatc agcacccaac tgaacgcctt 60 tcccggctgt gaagttgctg tcagcgacgc gccgagcggt ccagttgatt gtggtggtgg 120 aagcagaaga cagcgaaacg ctgatccaaa ccattgagtc agtacgcaag tagagggcgt 180 gctggcggtg tcgctgggtt atcaccagca ggaagagcaa ggtgaggaaa caccatgaaa 240 ctcagtcgtc gtagctttat gaagctacgc cgttgcggcg ctgcggcgcg tgccggtctc 300 123 271 DNA artificial sequence misc_feature (1)..(271) probe 123 cggtccggga cgtgctccag acgctgcgcg accgcgaata tgtgaagacc gaaaagaagc 60 ggctcgtccc cgaggacaaa ggccggatcg tcaccgcctt cctggagagc ttcttccgcc 120 gctacgtgga atacgacttc acggcggatc tggaggagca gctcgaccgc atctccaatt 180 ccgagatcga ctgggaagca ggtgcttcgc gatttctggc gcgacttctc ggcagccatc 240 ggcgagacga agagctgccg caccgcggag t 271 124 256 DNA artificial sequence misc_feature (1)..(256) probe 124 cggtcgtgga cgtgctccag gcgacctcgt ccaggctgag gctgatttca tcgagccagg 60 cgagatagca gttgaggtcg tcggtgtagg tggcgatcgt gcccaccgac gcccccagct 120 ccgtgggcca gggcatcgag ccagaggtgg gctaatcgct gattggtccc acgaagacca 180 gcgttcgtgc ggaccaacag gggccgtact cctgtattct ttcagaagga tctggggaag 240 actcgaactt gctgga 256 125 282 DNA artificial sequence misc_feature (1)..(282) probe 125 ccagaattcg tgatcggtgt tcgtcggacc aatcccgctg atcacctcga cctcacgcat 60 aggatcggat caggtgctga tctcgcaaac ccttaggacc tgtcgtcaga gcgaagggga 120 gggggactgt tattccacca tctctgtgtc gaactcggcc agagtgctcc gcgctgtgat 180 cagatcctcc aggcttctca atcgggcgat aaggcgatcc agccgcggtg tgagaaagat 240 caggtagcgg cttggttctc cgacctgtag tgatgcgcca gc 282 126 287 DNA artificial sequence misc_feature (1)..(287) probe 126 cggtcgggga cgtgctccag gaaatgaact gatccttctc cggcttgccg cgggcctgct 60 gatagtagcg gatgaagcgc acgctggaat cgaccgcgtc cgatccgccc agggtgaaat 120 agatgtggtt gagatcgccc ggcgcccgct cgggctagtg ccgaggcagg cggatgcccg 180 gctccgcgcc gaggccgaaa tagccggtcg cataaggcag ctcccgcatc tggcggctgg 240 cggcttccac gagtgctggt catgggccgt agccgggcgt tgacgcg 287 127 413 DNA artificial sequence misc_feature (1)..(413) probe 127 gcgttccggg ctcgaaggaa acgttcagaa ggtcagctat atcggccggc gattctttgc 60 ttcgtacctg cgcgacggcc gcaccgaagt aaggatgtac gatgaggcgc agagtctggg 120 cgtcgtaccc tgccgggtct cgggcgcgct gtcggttttg aaggacggaa acgagacgga 180 acgttctacg tattcacaag ctacacggtt ccctccgttg tttaccacta cgatttaaag 240 accacaagag cactcttgga agcaaccgaa ggtcgacgcg gatctacgaa atatgagacc 300 agcctcgtct tctacaacac aaagatggca cgcgcgtacc attgttcatc accgcgcgca 360 aggatataag ctggacggaa cagaatcccc tttaccctat gatacggcgg atc 413 128 300 DNA artificial sequence misc_feature (1)..(300) probe 128 gcgttccggg ttcgaaggaa gcaacttcca gcaggcggaa cgcctcatcc ctggcatcgc 60 atttcgctga tatcgttcaa ccgttcaacg cgcacgttgg taatttccaa cagaatgcgt 120 gatgcccatc gcggcatgtg aattgatgga cgccacccac catcaaactt tcattcacag 180 gtgtgaggtt tccaggtcgg gcatcatcgg gtatcgacca taaggccgta atcaccaggg 240 tttttggtcg ggaactgggc cgaataaatc cttgctgcgg ttcttctcat ctgccacgac 300 129 290 DNA artificial sequence misc_feature (1)..(290) probe 129 gcgttccggg ttcgaaggaa ggcttggact taatgagcaa ggagcggagg taatcgaaat 60 ggcaccattt ccaatcgaaa cgatactggg gaaagccggc gccctctctg tcttcctgtt 120 catcggagtc gcctttggat gggtgttgga gaacgccgga ttcggcaact caccaagctg 180 gcagcacagt tttatttcag agagatgacc gttctcaagg tcatgttcac ggccatcgtc 240 gtcgccatgg tcttgatatt cgcgacttca ggtctggggc ttctagacta 290 130 264 DNA artificial sequence misc_feature (1)..(264) probe 130 gcgttccggg ctcgaaggaa tactgtctca tgaacaggat atgctgcgtc ttcgcatcat 60 gatctggcgc actcttgcga ccgacacctt tgacatcgct ctgccggtta accagtcctt 120 tgatgtatgg gcaaccatca ttcgtggcaa attccagact gtatatcgcg acattattag 180 cgcgttaaat cttctggtgc gatggggatg tttgctggtg ctgatgcagc atctttcttc 240 aaacagttgc cgaaggattt cttc 264 131 273 DNA artificial sequence misc_feature (1)..(273) probe 131 ggcggttccg ggatcgaagg aaccgttcag aaggtcagct atatcggccg gcgattcttt 60 gcttcgtacc tgcgcgacgg ccgcaccgaa gtaaggatgt acgatgaggc cggcaagagt 120 ctgggcgtcg tacctctgcc gggtctcggg cgcgctgtcg gttttgaagg agggaagacg 180 agacggaaac gttctacgta ttcacaaggc tacacgggtc cctccggtgg tttaccacta 240 cgagttaaag acccacagga gcactccttg gga 273 132 261 DNA artificial sequence misc_feature (1)..(261) probe 132 cgtagagatg gggtctctcc atgtgcccag gctgttatcg aactcctggg ctcaagtgat 60 ccttctgcct tggtctccca aagtgctagg gttaaaagtg ctggggttat aagtgtgagc 120 cactgcctct agcccagttt tttagttctt gttacaaatt gccaagtaag gactaatcca 180 aaagactgga gtattttgtc aatgaacatg tttcaacata tgtatctctt acaaaatgca 240 gctggtttaa atcctaaagg c 261 133 285 DNA artificial sequence misc_feature (1)..(285) probe 133 cagcgcggca gtgggtgggt tattgctctg ttagctgtgc tggtactggt tggagccggg 60 gtgttcttct acgtcaaggg gatgcccgga tctcattcgg atgccgctcc tcaaccaacc 120 caggcaccaa tctctacctc tacgccagag gtcaggccaa cgcgaactgt gacgctcatg 180 ccacggtgac aacgatgagt tctcccatac agatccagct tcctggcggg gcggtggagt 240 gtggacaagg ggccttgatc gcaaatcctc gcaccacctg tctct 285 134 280 DNA artificial sequence misc_feature (1)..(280) probe 134 gttcagcggg ttggcgttca gaagcagcgt ggctgggtgc cggatggtgc gatccacgcg 60 atcgccgatg tgctgggtat tccggcaagc gacgtcgaag gtgtggcacg ttctacagtc 120 agatcttccg ccagccggtt ggtcgccatg tgaatccgtt attgtgacaa gcgtgtctgt 180 catatcacgg tatacaggta atcggcgcgc tcgagaaaag ctgactcacc gggcacgaca 240 tttgataggc gcttaagctg ctgccactgc tgctgggact 280 135 271 DNA artificial sequence misc_feature (1)..(271) probe 135 gttcagcggg ttgtcgttca tggccagacc agcagcgtat gctcctccag ggcttttgcg 60 atgggcacac cgcgggacat ggcctgctgc tcgcaagttt ccgcgtctct gtccggatcg 120 gcgcccgaag tgacccgtga acagcgccga gtccttcagg cccgcctctc gcgaacatcg 180 ccgagcgata cgcccgtcca ttccgcgcac gcgaccccgc cattggtcca gggattgcct 240 ccgccttcgg ctccgaagaa cgagcggccg t 271 136 236 DNA artificial sequence misc_feature (1)..(236) probe 136 gttcagcggg ttggcgttca gggattggtg catttgcctg cccttgctgc ctggaaccct 60 gaaaatcccg gtgactttgg cggtttgggc atgagcagtg acgagtcagc cattttctat 120 gcaatcggta ttggcgatgg cagctgggga gcattttatg atgtttgctg cctgtacccc 180 tacgtacggc aatctttggc tttagcagtc atttgcagtt ggtgcatggc cgtgtg 236 137 264 DNA artificial sequence misc_feature (1)..(264) probe 137 tcccggtccg gtggtcatga tccttcgccc tctgctcacg aaagatgctg tccgcccatc 60 ggaagaactc actatttcgc ggttgtgttg gtgggatccc ccggagcccg catcgcgcgt 120 gcgcatgagc tcattcgaga ggtgggcgac gagacttgag aggaaagcgc tggcgccggg 180 tgatggaagg cacacagtgc tcaacgcgga cgataccgat tggtccatct gtttcgtata 240 ggtccatgtg cttctcaact acat 264 138 301 DNA artificial sequence misc_feature (1)..(301) probe 138 accgtgtcga aggcgtttaa catgccggtg gatgagttac agggaagtgc agagcgactg 60 aagaaacgcc tcgagaatat gggtgagatc aaccctaccg caattgaggc gtacctggaa 120 atgaagaaac gttacgaatt catacttgaa acagaaagac ggatcttgga tctggaattc 180 gttcggacaa agctttcttc ggagcctagg ctagcttcta gaccacaacg tgtggggggg 240 cccgagctcc cggccgcaac aatttcacat tgggccgtcg tttttacaac gcttgttgtc 300 a 301 139 267 DNA artificial sequence misc_feature (1)..(267) probe 139 tcctggcccg gtcgtcatga tgttcacgtt attatgtagt ctgccggaca ccttattaca 60 ggatgagtat cagcagaaga gtgtgaacta tcaggcgcgg tgacatctgt gtggactaca 120 gtcagcatac tgactgcgct gtgatggctc tacgatgctc gcgaaaaaca ccccccatac 180 catatccgag cgagcgtgat tataacaacg tgcttccgac aagcgagagc ctcgcgctct 240 ggatagagat acatcgtgtc agattac 267 140 293 DNA artificial sequence misc_feature (1)..(293) probe 140 accctcgaag gcgttcaaca tcgccttcag ccttcattct cagtagttaa tgccatctgg 60 atggaaaaca gaggaatcta ctgctgtacc gacacatacg acggaggagg tgaatatcgg 120 cttgaaaatg gcatcgatgc gcggagacaa cagatgcagc aaaggagaaa tgatgtttga 180 agactactct tgcctgccag ggagagtaca tgccgaaagc agaaggcgta cacatcaaaa 240 gagatacatg gcgataatac ggaggataca acaggcggga acatgctgtg atg 293 141 251 DNA artificial sequence misc_feature (1)..(251) probe 141 tcctggtccg gtcgtaatga ttccgagctc gtcagcaatt tcagtactac ggaactgaaa 60 cttgtcagcc tcatcgggac ctattattat acctattcta cctgcagcct tattgccgga 120 attggcctgg ataagttcgg tggcaaaaga tcgctttttg caggtgcttt aattctggga 180 ataggctgtc tgtaatttct ttgcatctcg cttattcagg tgtgtgttgc aggaagattg 240 ttgcagggag c 251 

What is claimed is:
 1. A method of targeted cloning and enrichment of genes and gene clusters by: directly isolating and subsequent cloning the targeted genes/cluster.
 2. The method according to claim 1, wherein said isolating step includes the steps of: creating a primer containing a target oligonucleotide; adding the primer to a sample of DNA; and performing PCR to replicate the genes targeted by the primer.
 3. The method according to claim 2, wherein said creating step further includes creating a primer using k-tuple, template derivation and degenerate PCR.
 4. The method according to claim 2, wherein said performing step further includes performing degenerate, nested, and temperature gradient PCR.
 5. A primer for use in the method of claim
 1. 6. The primer according to claim 5, wherein said primer is selected from Table I.
 7. A method of isolating trimethoprim coding genes by directly isolating and subsequent cloning the trimethoprim coding gene.
 8. The method according to claim 7, wherein said isolating step includes the steps of: creating a primer containing a target oligonucleotide using a method selected from the group consisting of k-tuple, template derivation and degenerate PCR; adding the primer to a sample of DNA; and performing PCR to replicate the trimethoprim coding gene targeted by the primer.
 9. The method according in claim 8, wherein said creating step further includes creating a primer containing a target oligonucleotide coding for DHFR2.
 10. Probes for use in the method according to claim
 1. 11. The probes according to claim 10, wherein said probes are selected from Table II.
 12. Genes cloned by the method according to claim.
 13. The genes according to claim 12, wherein said genes are selected from Table II.
 14. A library formed by the method of claim
 1. 15. A method of providing degenerate cloning of an entire family of genes from a mixed DNA sample by directly isolating and subsequent cloning targeted genes/clusters.
 16. The method according to claim 15, wherein said isolating step includes the steps of: degenerately cloning a target oligonucleotide, creating a primer containing the target oligonucleotide; adding the primer to a mixed sample of DNA; and performing PCR to replicate the genes targeted by the primer.
 17. Genes cloned according to the method of claim 15 for use in affinity purification of genes.
 18. The genes according to claim 17, further including using said genes for cloning associated biosynthetic pathway genes. 